Auto-scheduler seems slower on int8

According to TLCBench, on CNNs like resnet50 and mobilenet, models tuned by auto-scheduler tend to be faster than AutoTVM. With the same setting, I’ve tested tuning Yolo v3 on Tesla T4 and the result is as follows.

AutoTVM Auto-Scheduler (nchw) Auto-Scheduler (nhwc)
fp32 17.23 ms 21.24 ms 18.67 ms
int8 8.54 ms 15.85 ms 15.92ms

Is there any possible reason or any missing important setting for int8 with auto-scheduler? Comparing debug_runtime results shows that for almost all layers conv2d AutoTVM is superior. autotvm_int8_vs_autoscheduler_int8

Experiment setting

  • Common
    • model: from darknet (input: 416x416)
    • opt_level: 3
    • batch_size: 1
  • AutoTVM
    • n_trial: 2000
    • layout: nchw
    • tuner: xgb
    • early_stopping: 600
    • number=20
    • repeat=3
    • timeout=4
    • min_repeat_ms=150
  • AutoScheduler
    • n_trial: 30000 (28 tasks)
    • layout: nhwc
    • repeat: 1
    • min_repeat_ms: 200
    • time_out: 20
    • early_stopping: 2000

For the int8, there are intrinsic like DP4A to accelerate. Currently, auto scheduler doesn’t support it.

3 Likes