module = graph_executor.GraphModule(lib["default"](ctx))
def thread_run:
for i in range(repeats):
module.run()
threads = []
for i in range(num_threads):
threads.append(PropagatingThread(
target=process_run,
))
There are 8 physical cores are occupied by TVM thread pools.
Each module.run() taskes 8ms.
Since there are still 4 physical cores occupied by each thread in 2-threaded run, the performace is expected to be the same as single-threaded run. But the performace of each thread in 2-threaded run actually is only as 50% of single-threaded run, and 4 -threaded run is only as 25% and so on…
Any idea about the performance degradation in multi-threaded run?
Thanks. It doesn’t improve. Performace in both single-threaded run and multi-threaded run drop 10-20%, and CPU utilization drops from nearly 100% to nearly 50% with TVM_THREAD_POOL_SPIN_COUNT=0, which I think is reasonable.
Do you mean if we use Python’s multi process, you could get ideal result but use Python’s multi thread, you get bad result? Or what else things you mean?
By “multi-prcocess” I mean run the python script two times simutaniously, while “multi-thread” is implemented by threading.Thread in python script.
Threads in python aren’t actually executed concurrently due to the GIL. So the reason everything is slower is because you aren’t actually doing anything in parallel. Also, it could be a lot of you time is actually spent in the python interpreter instead of executing you model. You should try using time evaluator (tvm.runtime — tvm 0.8.dev0 documentation) instead of your own python loop.