Thank you very much for your reply.
As I said before, I refer to this tutorial to deploy tvm: https://tvm.apache.org/docs/deploy/cpp_deploy.html. I export tvm.build function as a library first, then load and call the function in C++.
According to your suggestion, I set the cpu affinity in this way before calling tvm::runtime::Module::LoadFromFile:
tvm::runtime::threading::ThreadGroup::AffinityMode mode = static_casttvm::runtime::threading::ThreadGroup::AffinityMode(static_cast(-1)); tvm::runtime::ThreadPool::ThreadLocal()->UpdateWorkerConfiguration(mode, 4);
The frequency of each of my CPU is shown below:
index: 7 freqs: 3130000 index: 4 freqs: 2544000 index: 5 freqs: 2544000 index: 6 freqs: 2544000 index: 0 freqs: 2045000 index: 1 freqs: 2045000 index: 2 freqs: 2045000 index: 3 freqs: 2045000
Then, I unset TVM_NUM_THREADS and tested many times.Compared with before(TVM_NUM_THREADS=1), the performance is indeed better. However, the time-consuming fluctuation is relatively large. For 256 * 256 * 256, the minimum time-consuming can reach 1745us, and the maximum time-consuming can reach 10971us.