Strassen Algorithm for Dense

I don’t think u should set TVM_NUM_THREADS on arm because of arm’s BIG LITTLE architecture. I think you should call runtime.config_thread_pool to complete the core binding work. Another thing is we shouldn’t make tvm worker thread run different frequency cpus (aka, one worker thread is in the BIG cpu, one worker thread is in the LITTLE cpu), this will bring worse performance.

2 Likes