Platform: Pixel4 CPU
Network: ResNet18 on ImageNet from torchvision
Tuning: AutoTVM with XGBTuner for 1500 iterations
ResNet18 latency: 100ms
ResNet18 int8 quantized: 180ms
Platform: Pixel4 CPU
Network: ResNet18 on ImageNet from torchvision
Tuning: AutoTVM with XGBTuner for 1500 iterations
ResNet18 latency: 100ms
ResNet18 int8 quantized: 180ms
I posted the code here:
python -m baseline.tuning_main --quantize --target arm
Also notice similar behavior on rapsberry pi (arm chip). Have you tried AutoTVM instead of AutoScheduler?
Is this the target string you used? https://github.com/tigert1998/tvm-models-baseline/blob/main/baseline/tuning_main.py#L49
Iād try something like
target = "llvm -device=arm_cpu -mtriple=aarch64-linux-android -mattr=+v8.2a,+dotprod"
and yes, use AutoTVM instead of AutoScheduler
I also would explore changing the default layout. I have found that for some models NHWC yields better results for int8 than NCHW.
Thank you so much! You are right!