Is it possible to run two inference models concurrently in vta?

hjiang · June 8, 2022, 7:08pm

without explicitly setting the threadpool, multiple backend runtime instance control flow will share a same threadpool and execute the operator sequential,

you can reference this example CPU affinity setting of pipeline process when using config_threadpool - #2 by hjiang to make the different inference running in parallel.

we also have pipeline executor to handle multiple backend parallel running requirement, please reference this tutorial (in progress https://github.com/apache/tvm/pull/11557) when these backend have data dependency.