Title: Performance Drop After Compiling Model with TVM Compared to ONNX Runtime
Hi everyone,
I’m working on compiling an object detection model using TVM with auto-schedule optimization. However, I’m observing slower inference performance compared to running the original ONNX model with onnxruntime-gpu
. Below are the details:
1. Compilation and Auto-Tuning
-
Script Link:
auto_tuning_v2.py -
log Link:
log.txt
2. Performance Measurements
-
ONNX Runtime (GPU)
- Script: onnx_performance.py
- Mean inference time: 6.10 ms
-
TVM Model (Compiled)
-
Script: performance.py
-
Mean inference time (before tuning): 8.83 ms
-
Mean inference time (after tuning): 7.16 ms
-
Even with TVM’s auto-schedule optimization, the inference times (7.16 ms) are still slower than the ONNX Runtime result of 6.10 ms.
3. Environment
-
Docker Base Image:
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
-
CPU:
- Intel 12th Gen Core i9-12900K (24 CPU cores)
-
lscpu
excerpt:- Architecture: x86_64
- CPU op-mode(s): 32-bit, 64-bit
- Address sizes: 46 bits physical, 48 bits virtual
-
GPU:
- NVIDIA GeForce RTX 3090
- Driver Version: 535.183.01
- CUDA Version: 12.2
- GPU Memory: 24 GB
Question
Is this performance difference (6.10 ms with ONNX Runtime vs. 7.16 ms with TVM after auto-tuning) expected, or might there be something wrong in my workflow? Any suggestions on how to further analyze and improve the performance of the compiled TVM model would be greatly appreciated.
If there’s any additional information or logs you’d like me to provide, feel free to let me know. Thank you in advance for your help!