TVM Auto-tune: Reducing the number of tasks

I’m trying to auto-tune a convolutional network for NVIDIA GPUs but I would like to reduce the number of tasks from 24 to around 10. Is that possible or is this a set amount depending on the GPU model?

bash-4.2$ python3.6 tune_relay_cuda.py
Extract tasks...
Tuning...
[Task  1/24]  Current/Best:  525.06/ 598.87 GFLOPS | Progress: (680/1000) | 1103.78 s

Hi @582990,

Every operator of a convolutional networks is implemented through one or more strategies (strategy=compute+schedule). Every strategy needs to be tuned and represents a “task”. If a strategy defines no knobs, the tuning of that strategy will be a nop.

To reduce the number of tasks, you can simply look at tvm/relay/op/strategy/cuda.py and see if any of the operators you are using is “trying” multiple strategies. If this is the case, you can select a single strategy so reducing the amount of tasks.

However, if all your operators are using a single strategy, things get more complicated, as you should try to force the compiler to select (only for some layers) an “untuned” operator (it would still appear as a task, but without any knobs to tune).

Hope this helps, Giuseppe

1 Like