How to set custom cuda stream when doing model inference?

nicklhy · July 15, 2020, 8:18am

Hi, I am working on a C++ project that involves multiple model instances running on gpus. I can do model inference like below

tvm::runtime::PackedFunc f = mod.GetFunction("run");
f();  // do inference

But I didn’t find how to set TVMStreamHandler for these model instances yet. Currently, all of them are running on the default cuda stream. Is it possible to set multiple streams to achieve async and parallel inference in tvm ?