I was at the C4ML workshop, and I would like to share some of my thoughts. MLIR as itself is a meta-way of defining IRs, in the folks’ word “XML for IRs”. In particular, for Google, there will be dialects like MLIR-XLA, MLIR-TFLite, and MLIR-TFGraph. Concrete compiler solutions still need to be built for each layer of the dialect languages, and they can be very different due to the difference in the semantics of the operators. In short, MLIR itself offers a direction of infra, rather than a solution to the problems we are facing. It will really be interesting to see how things will move and how the TVM community and learn from and work with MLIR.
I agree that principled compiler treatment for deep learning optimization is the way to go. We also need to bring in novel solutions beyond traditional compiler techniques and put machine learning in the center. The TVM community has been pioneering the direction of deep learning compilation, and I hope we can continue to do so by learning from and work with MLIR
Here are the things that we already do, which shares the same philosophy with MLIR
- Extensible operator definition and pass infrastructure(in relay)
- Layers of IR and co-optimization when possible
Besides that, here are two things that we should learn from the MLIR, that we can move toward in the incoming year:
- Unify the relay and the tensor expression optimization layer, to bring a unified TVM-IR that works across layers of optimization.
- Make sure that the TVM stack and its IR interoperates with dialects of MLIR.
In the meanwhile, we can collectively work together to build a principled stack that automatically optimizes for models across CPU, GPU, and specialized accelerators. The direction like relay, pass improvements, fomalization of symbolic integer analysis and hardware stack will contribute a lot to that direction. I hope we as a community can strive to provide an open, full stack solution that reflects the MLIR principle, work with MLIR and enable the future of deep learning compilation.