Over the past year, we have been successfully modernizing the foundational ffi module that is now helpful to the broader community. This thread aims to begin a discussion on modernizing the TIR components. Up until now, we have been leveraging the user-defined schedule paradigm to transform the code. While this approach remains useful in many domains, we are starting to see limitations where the user-defined schedule may not cover all possible optimizations, especially in programming the latest GPUs.
In the meantime, we also recognize the strong value in the low-level TVMScript and TIR infrastructure. It serves as a foundational layer to enable programming code in Python, offer robust kernel code generation, and ship kernels with tvm-ffi. These values continue to grow today for both downstream frameworks and R&D purposes. Given the latest state, we believe it is a good time to rethink how the TIR is structured. Specifically, I think we are moving towards the following two layers:
- s-tir (schedulable TIR): This layer will contain the user-defined schedule and meta-schedule components. It will get decoupled from the core tensor-level IR and lowers to it
- tir(next): We will evolve a new core abstraction to no longer rely on the schedule
The high-level idea is that s-tir will continue to serve its current purpose and lower to the core layer. We will evolve the new low level abstraction to be independent from the schedule, so it can be focused and more lightweight. The focus of the new core abstraction will become a more lightweight structure focused on representing low-level programs:
- G0: Enable all possible optimizations via low-level access
- G1: Python-first scripting, with rich support for kernel programming needs (e.g. support general control-flows, first class gpu threads and scopes)
- G2: Robust code generation and connection to broader ecosystem via tvm-ffi
This will serve the upcoming needs to support the latest GPUs. This post aims to bring the community to this direction. We can start to do some refactors in the new year to enable this modernization. Hopefully, we can continue to support the community while also making the codebase more useful to the broader ML systems community.