Kernel Launching and Dynamic Kernel Selection for LLMs with Relay

RyanTomich · September 9, 2024, 2:46am

Do you know of any examples of compiling models from Hugging Face transformers to Relax, getting the computational graph, and launching kernels for each node? I have looked around the Relax code a bit and didn’t see anything that would enable this level of fine-grained control.