Tips for troubleshooting tuning slowdowns?

Additional data. With the exact same ONNX models, I can also sometimes see slowdowns for other CPU/GPU combinations. For example, I see the following:

inceptionv3:

  • autoscheduler is faster on Radeon VII (0.57x elapsed time) and slower on RTX 3070m (1.09x elapsed time)
  • autorvm is slower on Radeon VII (1.12x) and RTX 3070m (1.37x)

resnet50

  • autoscheduler is slower on Radeon VII (2.26x) and RTX 3070m (1.04x)
  • autotvm is slower on Radeon VII (3.51x) and RTX 3070m (1.25x)

vgg16

  • autoscheduler is slower on Radeon VII (4.19x) and RTX 3070m (1.92x)
  • autotvm is slower on Radeon VII (1.47x) and RTX 3070m (1.08x)

Tuning these ONNX files results in slower code in 5 of 6 cases than using untuned ONNX files to start. Howe to best investigate further?