-
Device: Skylake 8163 with 48 pysical cores
-
Env Setting: TVM_BIND_THREADS=0 TVM_NUM_THREADS=4
-
Code Snippet:
module = graph_executor.GraphModule(lib["default"](ctx))
def thread_run:
for i in range(repeats):
module.run()
threads = []
for i in range(num_threads):
threads.append(PropagatingThread(
target=process_run,
))
-
When num_threads=1
There are 4 physical cores are occupied by TVM thread pool. Each module.run() taskes 4ms. -
When num_threads=2
There are 8 physical cores are occupied by TVM thread pools. Each module.run() taskes 8ms.
Since there are still 4 physical cores occupied by each thread in 2-threaded run, the performace is expected to be the same as single-threaded run. But the performace of each thread in 2-threaded run actually is only as 50% of single-threaded run, and 4 -threaded run is only as 25% and so on…
Any idea about the performance degradation in multi-threaded run?