TVM int8 quantization slower than float on arm?

Platform: Pixel4 CPU

Network: ResNet18 on ImageNet from torchvision

Tuning: AutoTVM with XGBTuner for 1500 iterations

ResNet18 latency: 100ms

ResNet18 int8 quantized: 180ms

I posted the code here:

python -m baseline.tuning_main --quantize --target arm

Also notice similar behavior on rapsberry pi (arm chip). Have you tried AutoTVM instead of AutoScheduler?

Is this the target string you used? https://github.com/tigert1998/tvm-models-baseline/blob/main/baseline/tuning_main.py#L49

Iā€™d try something like

target = "llvm -device=arm_cpu -mtriple=aarch64-linux-android -mattr=+v8.2a,+dotprod"

and yes, use AutoTVM instead of AutoScheduler :slight_smile:

1 Like

I also would explore changing the default layout. I have found that for some models NHWC yields better results for int8 than NCHW.

Thank you so much! You are right!