Hi @lemo, one thing you could try to explore different executors is using the relay.build_module.create_executor(kind) API, where kind can be “graph”/“vm”/'debug".
For example, this tutorial loads an ONNX model, compiles it, and executes it in TVM. It uses the graph executor, but you can try the vm executor also by simply replacing “graph” with “vm” in the create_executor API. And you can do further benchmarking and compare their performance.
Beyond the ability to switch executors, I’m trying to find out which benchmarks are accepted as interesting and/or representative. Is there a perf regression suite for example? Or a set of models that can be used as a stable baseline?
There are a few benchmark workloads in tvm.relay.testing that you can directly construct and do benchmarking in TVM: tvm.relay.testing — tvm 0.9.dev182+ge718f5a8a documentation. These are representative models such as MobileNet, ResNet, DenseNet, LSTM, and so on. For example, this tutorial talks about how to auto tune a ResNet for NVIDIA GPU.