Auto-scheduler seems slower on int8

imorinaga · April 1, 2021, 6:57am

According to TLCBench, on CNNs like resnet50 and mobilenet, models tuned by auto-scheduler tend to be faster than AutoTVM. With the same setting, I’ve tested tuning Yolo v3 on Tesla T4 and the result is as follows.

	AutoTVM	Auto-Scheduler (nchw)	Auto-Scheduler (nhwc)
fp32	17.23 ms	21.24 ms	18.67 ms
int8	8.54 ms	15.85 ms	15.92ms

Is there any possible reason or any missing important setting for int8 with auto-scheduler? Comparing debug_runtime results shows that for almost all layers conv2d AutoTVM is superior. autotvm_int8_vs_autoscheduler_int8

Experiment setting

Common
- model: from darknet (input: 416x416)
- opt_level: 3
- batch_size: 1
AutoTVM
- n_trial: 2000
- layout: nchw
- tuner: xgb
- early_stopping: 600
- number=20
- repeat=3
- timeout=4
- min_repeat_ms=150
AutoScheduler
- n_trial: 30000 (28 tasks)
- layout: nhwc
- repeat: 1
- min_repeat_ms: 200
- time_out: 20
- early_stopping: 2000

FrozenGene · April 1, 2021, 10:30am

For the int8, there are intrinsic like DP4A to accelerate. Currently, auto scheduler doesn’t support it.