How to improve the auto-tune performance?

  1. I meant you don’t have to spend the tuning time on unnecessary parts.

  2. If an op latency is about 1 ms and the number is set to 5, then one measurement takes 5 ms. It means the repeat number will be 200 if you set min_repeat_ms=1000, but remember, it still takes 1000 ms in total. The purpose of this setting is to guarantee the measurement accuracy, as the error rate of 1 ms latency could be huge.

You can actually estimate the tuning time, by the way. By default AutoTVM compiles 8 configs in parallel. Assuming it takes 5 secs to compile one config, then measuring 8 configs needs totally 5+8*1=13 seconds (the measurement part cannot be run in parallel because you only have one GPU). Then 4,000 trials will take about (4000/8)*13=6,500 seconds ~1.8 hours. This tuning time is actually common for AutoTVM on GPUs.