[QNN] AutoTVM for qnn.conv2d

maekawatoshiki · September 19, 2023, 3:26pm

Hi, TVM community!

Recently when I was running a int8 quantized DeepLab v3 model (imported from ONNX) using OpenCL target, I noticed that the inference time was quite long (~7s/image on NVIDIA T4). I was using the FakeQuantizationToInteger pass to convert as many relay ops to qnn ops as possible.

I tried autotvm for tuning (referring to Auto-tuning a Convolutional Network for NVIDIA GPU — tvm 0.14.dev0 documentation), but it didn’t improve performance.

I changed ops specified for tuning from ‘nn.conv2d’ to ‘qnn.conv2d’, but it resulted in being returned no tasks from autotvm.task.extract_from_program(…).

From a past question (Autotvm.task_extract_from_program in TFLite - #18 by anijain2305), autotvm seems it had not supported tuning for int8 at that time.

Are qnn ops not supported on autotvm yet?

Krishna · November 29, 2023, 10:30am

Hi @maekawatoshiki , I am facing the exact issue as well.

I am trying to autotune my resnet50 model which has been int8 quantized using tflite, using AutoTVM. I specified qnn.conv2d in the tuning ops but none of the conv layers are being picked up during tuning. The tuning progress simply shoots to 50% , and then 100% completion, with only 2 tasks getting picked up in total, against a usual 22 for resnet.

Have you found any resource which has a fix for this? I have searched but in a losing effort.

TIA, have a great day!

maekawatoshiki · November 29, 2023, 10:49am

I figured out that auto_scheduler resolves this problem. I think we should use it instead.

Krishna · November 29, 2023, 1:52pm

Okay, I will try using auto_scheduler.

Thank you so much for your quick response!