Strassen Algorithm for Dense

Thank you very much for your reply.

The hardware I use is AArch64 CPU with 8 cores. I refer to this tutorial to deploy tvm:https://tvm.apache.org/docs/deploy/cpp_deploy.html.The c++ thread that load and use tvm library is bound to 3 intermediate frequency cpus, and TVM_NUM_THREADS is set to 1(There is a question that confuses me: the larger the TVM_NUM_THREADS, the worse the performance, so TVM_NUM_THREADS is set to 1.I did not figure out why is the optimal TVM_NUM_THREADS not 3 or 8.)

According to your conclusion, I think it is not easy to make tvm beyond MNN on my current hardware(ARM CPU, 8 cores, not want to occupy all the cpus), which makes me feel a little frustrated.But I will still make some efforts , such as try Ansor.If there is any progress, I will be happy to discuss further with you.