I’m trying to tune resnet50 on Adreno GPU on QNX OS using Ansor. The tuning succeeds but the performance of tuned resnet50 models is way worse than untuned model. Following are the inference time of tuned and untuned models: Untuned Resnet50 fp32 onnx model : 42 ms Tuned Resnet50 fp32 model : 80 ms
I’ve tuned resnet for 20000 trials. The model was tuned via ansor in “NHWC” format. How to resolve this discrepancy? @srkreddy1238 @comaniac @tqchen