Additional data. With the exact same ONNX models, I can also sometimes see slowdowns for other CPU/GPU combinations. For example, I see the following:
inceptionv3:
- autoscheduler is faster on Radeon VII (0.57x elapsed time) and slower on RTX 3070m (1.09x elapsed time)
- autorvm is slower on Radeon VII (1.12x) and RTX 3070m (1.37x)
resnet50
- autoscheduler is slower on Radeon VII (2.26x) and RTX 3070m (1.04x)
- autotvm is slower on Radeon VII (3.51x) and RTX 3070m (1.25x)
vgg16
- autoscheduler is slower on Radeon VII (4.19x) and RTX 3070m (1.92x)
- autotvm is slower on Radeon VII (1.47x) and RTX 3070m (1.08x)
Tuning these ONNX files results in slower code in 5 of 6 cases than using untuned ONNX files to start. Howe to best investigate further?