Thanks you for replying!
using mkl_verbose=1, i found that tvm_num_threads do not affect threads used by mkl.
so i used mkl_num_threads and resolved problems(fluctuation, slow)
with and without -libs=mkl, the inference time is measured approximately the same.
While searching for the reason, I found out that tvm uses mkl to optimize the only dense layer. I also found that auto-tvm can tune the dense layer.
As a result, if not use mkl, the tvm default tuning option applies to all layers. If use it, mkl would be applied a dense layer. is this right?
If I’m right, auto-tvm (default) auto-tvm + mkl(only dense)
these two cases show similar performance, and can i say that the schedule primitives of tvm show as much performance as mkl?