I am tuning and compiling a model with a number of different targets (llvm, metal, opencl) and just came across this useful info about the number of tuning trials in this article:
You can set it to a small number (e.g., 200) for a fast demonstrative run. In practice, we recommend setting it around
800 * len(tasks)
, which is typically enough for the search to converge.
My model is quite large - it has ~72 tasks, which means the suggested number of trials would be 57600.
A tuning run with 100 trials took 2.8 hours to complete, so that makes it seem like a tuning run with 57600 trials would take 67 days!
I know I can parallelize this process but I’m wondering if the suggested number of trials is really correct? If so, is the run-time speed improvement gained by tuning predictable? Can I expect a certain performance gain from any given number of tuning trials?
I’m trying to pick a number that gets me the biggest bang for my buck. Thank you!