Hi, I am working on a C++ project that involves multiple model instances running on gpus. I can do model inference like below
tvm::runtime::PackedFunc f = mod.GetFunction("run");
f(); // do inference
But I didn’t find how to set TVMStreamHandler
for these model instances yet. Currently, all of them are running on the default cuda stream.
Is it possible to set multiple streams to achieve async and parallel inference in tvm ?