Performance drop after tuning resnet50 with Ansor

I’m trying to tune resnet50 on Adreno GPU on QNX OS using Ansor. The tuning succeeds but the performance of tuned resnet50 models is way worse than untuned model. Following are the inference time of tuned and untuned models: Untuned Resnet50 fp32 onnx model : 42 ms Tuned Resnet50 fp32 model : 80 ms

I’ve tuned resnet for 20000 trials. The model was tuned via ansor in “NHWC” format. How to resolve this discrepancy? @srkreddy1238 @comaniac @tqchen

Are you using Ansor with the CLML extensions? I saw somewhere that CLML uses NCHW. I’m not totally sure, but I do know that the cpp_clml.py script in the tvm project (which converts models to a complete clml model with host code) turns the model into NCHW.

For me, on top off seeing a performance drop, the search process is very slow.

@srkreddy1238 @comaniac @lianminzheng