Does TVM stack have RAMMER-like optimization?

The RAMMER strategy comes from the paper: Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. It aims to use a fine-grained rTask decomposition to manage the scheduling of inter- and intra- operator together so that we can achieve a higher GPU utilization and lower operator scheduling overheads. I’m curious about whether TVM stack has similar philosophy:

  • If so, what’s the corresponding implementation in the codebase?
  • If not, is it possible to support it in TVM or what’s the hardness in supporting it?

Thanks a lot for any comment!