Speed slower than pure onnx model

I use the tutorial in Compiling and Optimizing a Model with the Python Interface (AutoTVM) — tvm 0.8.dev0 documentation. And enlarge the search space (each task about 1500 step). But the optimal result is slower than just running the pure onnxrunning version. Is there anything wrong? My cpu is AMD EPYC 7K62 48-Core Processor which support AVX2. My target choose llvm -mcpu=core-avx2. Other things just like the tutorial, is there someone help me?

Try with -mcpu=bdver4? I think this will turn on more optimizations.

Or something more appropriate from this list: x86 Options (Using the GNU Compiler Collection (GCC))

You can also try upping the kernel trials even more. Though 1500 per task seems like a lot already.

I’m one of the authors of that document, and what you’re seeing is an issue that I’ve been thinking about a lot lately with respect to that tutorial and the goals of TVM. For models that are very well known, like resnet50, the models and runtimes have been hand-tuned to run fairly efficiently, especially on common platforms like CPUs. A result of that is that it’s harder for TVM to deliver much more performance than the baseline.

That’s left me to think about if there are other models we could use to demonstrate the potential for TVM to improve performance of models. I want to pick a model that is understood and is easy to visualize the input and outputs with, but will also tune efficiently across a wide variety of models and platforms.

What I’m hoping is that the tutorial will give you the opportunity to apply the teachings to your own models of interest, but I’d like to find some models to update the tutorials with that do a better job of demonstrating the potential of TVM optimization.