Does TVM stack have RAMMER-like optimization?

zep · October 30, 2024, 10:53pm

The RAMMER strategy comes from the paper: Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. It aims to use a fine-grained rTask decomposition to manage the scheduling of inter- and intra- operator together so that we can achieve a higher GPU utilization and lower operator scheduling overheads. I’m curious about whether TVM stack has similar philosophy:

If so, what’s the corresponding implementation in the codebase?
If not, is it possible to support it in TVM or what’s the hardness in supporting it?

Thanks a lot for any comment!