Difference between autotuning and auto scheduling

I went through the docs for autotuning and auto scheduling, but as a beginner, sorry I am a bit confused, I had some very basic doubts -

i) Is autotuning where we extract the tasks and tune them same as AutoTvm? If so, since autoscheduling (Ansor) is supposed to be an improvement over autotuning therefore we don’t really need to implement autotune if we implement autoschedule?

ii) Assuming above, is the only difference between autotune(autotvm) and autoschedule that one needs manual template and the other doesn’t which consequently results in efficient traversal of the config search space for the tasks in case of autoschedule? I did not have to write any manual template for the tasks in standard resnet50 but I assume that it is because all the templates are already in the repo.

iii) Are the tasks that autotune and autoschedule tune, the optimizations they apply etc. all same? Is there any documentation regarding the optimizations applied for both of them?

1 Like

@comaniac @Hzfengsy @tqchen Could one of you please help me with this?

Auto tuning is a general topic, while there are three implementations in TVM stack: AutoTVM, Ansor, and Meta-schedule. You are right. Ansor is an improvement over AutoTVM.

One significant difference is that Ansor does not need templates, which is based on its algorithm rather than we’ve embedded all templates in the repo. On the other side, we do not need to write templates either if we use autoTVM to tune standard models. That’s because templates for common operators are in the repo.

The optimizations are not totally the same. Usually, ansor works better than AutoTVM on non-tensorized workloads. However, we can’t apply both of them since they are two different algorithms.

1 Like

Thanks a lot for the reply! Is there any documentation which specifies all the optimizations in Ansor or Autotvm? Looking at the logs of the ConfigSpace it looks to be only tiling and unrolling

I’m afraid there is no documentation.

Thanks for the reply!

I have used autotvm tuning for resnet50 on Jetson and already see an improvement of almost 85%!!

Once I generated a log file with tuning for all the tasks, just to analyse, I tried to extract the best config for all the tuned tasks via

history_best_context = tvm.autotvm.apply_history_best(tunefile) // tunefile is generated via autotuning
for i, tsk in enumerate(reversed(tasks)): // the extracted tasks that were tuned
          best_config = history_best_context.query(tsk.target, tsk.workload)
          print("\nBest config:")
          print(best_config)

For the task number 2, “conv2d_nchw_winograd.cuda”, I see an output

[('tile_b', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 16, 16]), ('tile_x', [-1, 2, 8, 1]), ('tile_rc', [-1, 8]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,180142

Extracting the best record for the same tuning via tvm.autotvm.record.pick_best(tunefile, best_records), again for the task “conv2d_nchw_winograd.cuda” gives

{"input": ["cuda -keys=cuda,gpu -arch=sm_72 -max_num_threads=1024 -model=unknown -thread_warp_size=32", "conv2d_nchw_winograd.cuda", [["TENSOR", [1, 512, 7, 7], "float32"], ["TENSOR", [512, 512, 3, 3], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "float32"], {}], "config": {"index": 180142, "code_hash": null, "entity": [["tile_b", "sp", [-1, 1, 1, 1]], ["tile_y", "sp", [-1, 1, 16, 16]], ["tile_x", "sp", [-1, 2, 8, 1]], ["tile_rc", "sp", [-1, 8]], ["auto_unroll_max_step", "ot", 1500], ["unroll_explicit", "ot", 0]]}, "result": [[0.0003275632937062937], 0, 3.7285542488098145, 1656916672.9839602], "version": 0.2, "tvm_version": "0.9.dev0"}

Therefore the last item in the “best config” denotes the index. Is the index here, the index in the ConfigSpace? The ConfigSpace length for this task comes out as 462000 and I had done 2000 trials, so was this 180142 config permutation randomly selected by autotvm via its algorithm?

Yes the index is for the config space. There are several tuning algorithms you can select in AutoTVM, and random selection is one of them. The recommended one is XGBTuner, which uses simulated annealing to generate a set of schedule candidates, and trains a XGBoost model to predict the top-N for measurements.

1 Like