Hey everyone,
We have been working on a way to improve Ansor’s exploitation phase. We have a PR ready (#16499). I would like to discuss the idea in the mailing list, in case you have ideas to improve it. The idea is as follows:
i. Run Ansor over a given end-to-end model that must be optimized;
ii. Select the best implementation of that model that Ansor finds;
iii. Use Droplet Search to exploit that candidate.
By combining Ansor with Droplet Search’s coordinate descent (which we take from AutoTVM), we can reduce the number of trials that Ansor explores, while still obtaining higher quality kernels. We have been able to demonstrate that we can get faster kernels with less search time in four architectures: Nvidia A100, AMD x86, and ARM A64FX. A summary of these results is available in this manuscript: bennu paper
Figure 4 in the above manuscript contains a summary of the proposal. The proposed extension does not change the way Ansor is used. Its original implementation can still be invoked via the same commands, without any modification. If the user wants to apply Droplet Search on the best model found by Ansor after a number of trials, all that she needs is to run a second command. Section 2.4 of the PDF explains how to do it. In terms of implementation, the new patch modifies 307 lines of code in the TVM code base. Section 2.5 of the PDF explains the changes. I’ve also prepared a docker container with the experiments here: repository, in case someone wants to reproduce our experiments.
Regards,
Michael Canesche, Gaurav Verma and Fernando Pereira