The RAMMER strategy comes from the paper: Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. It aims to use a fine-grained rTask decomposition to manage the scheduling of inter- and intra- operator together so that we can achieve a higher GPU utilization and lower operator scheduling overheads. I’m curious about whether TVM stack has similar philosophy:
- If so, what’s the corresponding implementation in the codebase?
- If not, is it possible to support it in TVM or what’s the hardness in supporting it?
Thanks a lot for any comment!