Strassen Algorithm for Dense

Thank you very much for your reply.

As I said before, I refer to this tutorial to deploy tvm: https://tvm.apache.org/docs/deploy/cpp_deploy.html. I export tvm.build function as a library first, then load and call the function in C++.

According to your suggestion, I set the cpu affinity in this way before calling tvm::runtime::Module::LoadFromFile:

tvm::runtime::threading::ThreadGroup::AffinityMode mode = static_casttvm::runtime::threading::ThreadGroup::AffinityMode(static_cast(-1)); tvm::runtime::ThreadPool::ThreadLocal()->UpdateWorkerConfiguration(mode, 4);

The frequency of each of my CPU is shown below:

index: 7  freqs: 3130000
index: 4  freqs: 2544000
index: 5  freqs: 2544000
index: 6  freqs: 2544000
index: 0  freqs: 2045000
index: 1  freqs: 2045000
index: 2  freqs: 2045000
index: 3  freqs: 2045000

Then, I unset TVM_NUM_THREADS and tested many times.Compared with before(TVM_NUM_THREADS=1), the performance is indeed better. However, the time-consuming fluctuation is relatively large. For 256 * 256 * 256, the minimum time-consuming can reach 1745us, and the maximum time-consuming can reach 10971us.