Is it possible to run two inference models concurrently in vta?

In vta, is it possible to run two inference tasks concurrently using Python’s multithreading? I tried it and found that the two tasks are executed serially.

without explicitly setting the threadpool, multiple backend runtime instance control flow will share a same threadpool and execute the operator sequential,

you can reference this example CPU affinity setting of pipeline process when using config_threadpool - #2 by hjiang to make the different inference running in parallel.

we also have pipeline executor to handle multiple backend parallel running requirement, please reference this tutorial (in progress https://github.com/apache/tvm/pull/11557) when these backend have data dependency.

1 Like