In vta, is it possible to run two inference tasks concurrently using Python’s multithreading? I tried it and found that the two tasks are executed serially.
without explicitly setting the threadpool, multiple backend runtime instance control flow will share a same threadpool and execute the operator sequential,
you can reference this example CPU affinity setting of pipeline process when using config_threadpool - #2 by hjiang to make the different inference running in parallel.
we also have pipeline executor to handle multiple backend parallel running requirement, please reference this tutorial (in progress https://github.com/apache/tvm/pull/11557) when these backend have data dependency.
1 Like