When using TVM v0.7 C++ API to do inference in llvm target CPU, I set the “TVM_NUM_THREADS=16” as there are 16 logical cores, and then ran a benchmark test script which launched 2 std::thread, each one runs a loop of 1000 synchronous inference run call. The 16 CPUs usage are 100%.
But when I run the same case in TensorFlow. 16 CPUs are about 40% for earch.
I also read that “TVM_BIND_THREADS =1” sets the CPU affinity, however, it seems it does not have any impact when I set it (TVM_BIND_THREADS=1) or unset it (TVM_BIND_THREADS=0).
How can I set/tune these parameters? TVM_NUM_THREADS/TVM_BIND_THREADS/OMP_NUM_THREADS
I searched some similar topics, but didn’t get correct answer to this problem.
https://discuss.tvm.apache.org/t/setting-per-core-usage-explicitly-in-tvm/4538
Could you please shed some lights here? Thanks.