According to TLCBench, on CNNs like resnet50 and mobilenet, models tuned by auto-scheduler tend to be faster than AutoTVM. With the same setting, I’ve tested tuning Yolo v3 on Tesla T4 and the result is as follows.
AutoTVM | Auto-Scheduler (nchw) | Auto-Scheduler (nhwc) | |
---|---|---|---|
fp32 | 17.23 ms | 21.24 ms | 18.67 ms |
int8 | 8.54 ms | 15.85 ms | 15.92ms |
Is there any possible reason or any missing important setting for int8 with auto-scheduler? Comparing debug_runtime results shows that for almost all layers conv2d AutoTVM is superior.
Experiment setting
- Common
- model: from darknet (input: 416x416)
- opt_level: 3
- batch_size: 1
- AutoTVM
- n_trial: 2000
- layout: nchw
- tuner: xgb
- early_stopping: 600
- number=20
- repeat=3
- timeout=4
- min_repeat_ms=150
- AutoScheduler
- n_trial: 30000 (28 tasks)
- layout: nhwc
- repeat: 1
- min_repeat_ms: 200
- time_out: 20
- early_stopping: 2000