Google lasted work: MLIR Primer

xqdan · February 22, 2019, 1:15am

https://drive.google.com/file/d/1hUeAJXcAXwz82RXA5VtO5ZoH8cVQhrOK/view
This is huge since the author of this work is the author of LLVM. I just go thru it quickly, can’t wait to discuss it with folks in tvm, my take-aways,

principled compiler is definite a trend to forward, Technics in traditional compilers like ssa is very important. also see DLVM 2018 https://openreview.net/pdf?id=HJxPq4ywz
the applying of Polyhedral will be critical. we might see poly work of google in the future, since they have the best poly team in the world.

tqchen · February 22, 2019, 2:09am

I was at the C4ML workshop, and I would like to share some of my thoughts. MLIR as itself is a meta-way of defining IRs, in the folks’ word “XML for IRs”. In particular, for Google, there will be dialects like MLIR-XLA, MLIR-TFLite, and MLIR-TFGraph. Concrete compiler solutions still need to be built for each layer of the dialect languages, and they can be very different due to the difference in the semantics of the operators. In short, MLIR itself offers a direction of infra, rather than a solution to the problems we are facing. It will really be interesting to see how things will move and how the TVM community and learn from and work with MLIR.

I agree that principled compiler treatment for deep learning optimization is the way to go. We also need to bring in novel solutions beyond traditional compiler techniques and put machine learning in the center. The TVM community has been pioneering the direction of deep learning compilation, and I hope we can continue to do so by learning from and work with MLIR

Here are the things that we already do, which shares the same philosophy with MLIR

Extensible operator definition and pass infrastructure(in relay)
Layers of IR and co-optimization when possible

Besides that, here are two things that we should learn from the MLIR, that we can move toward in the incoming year:

Unify the relay and the tensor expression optimization layer, to bring a unified TVM-IR that works across layers of optimization.
Make sure that the TVM stack and its IR interoperates with dialects of MLIR.

In the meanwhile, we can collectively work together to build a principled stack that automatically optimizes for models across CPU, GPU, and specialized accelerators. The direction like relay, pass improvements, fomalization of symbolic integer analysis and hardware stack will contribute a lot to that direction. I hope we as a community can strive to provide an open, full stack solution that reflects the MLIR principle, work with MLIR and enable the future of deep learning compilation.

FrozenGene · February 22, 2019, 2:27am

Some folks of us discussed it. Yes, MLIR is good, however, it is not total solution like TVM provides. We don’t know what will be happened next. One thing I think we can care is we can make sure interoperates with MLIR when it release.

Polyhedral is also good. Some folks leveraged LLVM Polly. See http://pollylabs.org/gsoc2017/Enable-Polyhedral-Optimizations-in-XLA-through-LLVM-Polly.html. The key is to modify Polly’s SCoP detection and can detect conv2d.

xqdan · February 22, 2019, 2:36am

@FrozenGene poly on llvm is difficult, poly on tensor layer is a different story, more powerful, no need to detect conv2d.

FrozenGene · February 22, 2019, 2:48am

Yeah. I know. I just indicate someone has done it like this way. http://llvm.org/devmtg/2017-10/slides/Agarwal-Enabling%20Polyhedral%20optimizations%20in%20TensorFlow%20through%20Polly.pdf

pengd · February 22, 2019, 2:55am

I agree with this point. The designs and thoughts of MLIR may be shared with other frameworks, including its IR and Polly. MLIR seems built it on LLVM IR, but it may be too complicated to be necessary for TVM. A subset of LLVM IR may be enough for DSL, so that Halide remains the best choice for TVM. It will make Polly’s SCoP detection easier, cause the flow graph is simpler to analyze.

FrozenGene · February 22, 2019, 2:56am

One more thing I come up suddenly. We could have one documentation page to compare others NN compiler technology(such as XLA, GLOW) in our docs.tvm.ai like Clang do it (compare with GCC) in its site. I think many people are interested at it. For example, we could list: frontend framework support, hardware backend support, common model performance data, what companies are involved. @tqchen

zhiics · February 22, 2019, 5:02am

+1 for unifying TVM IR across different layers for optimization. I think we’ve already seen the need/benefit of it when working on Relay runtime. It would be interesting to see Relay(v2)/NNVM(v3) to unify different IRs.

aca88 · February 22, 2019, 7:12am

I would also like to point that this would be great. To be honest at the beginning of my search for compilers for DL I had a hard time trying to grasp what each one was better at than the other.

grwlf · February 22, 2019, 9:03am

This presentation is indeed very interesting. Regarding multi-layer IR, I think it may be a Google’s answer to ONNX, which, as far as I understand, also aims at establishing some standard for representing ML models. We probably should wait for the specifications and then provide compatibility with Relay.

The really interesting thing is Polyhedral topic they announced. I think this case needs close attention and action. TVM may benefit of adopting techniques from Polyhedral world. As I mentioned in https://github.com/dmlc/tvm/issues/2588, we may want to include ISL as a third-party dependency and make experiments to become more familiar with the semantics.

Edit: Interesting news are coming from Tiramisu project which combines polyhedral model with scheduling language. https://arxiv.org/pdf/1804.10694.pdf . Note their criticism of Halide’s approach to scheduling.

janimesh · February 22, 2019, 10:23pm

+1 for this. I have some notes on some of the alternative deep learning compiler work. Can share those.

xqdan · February 23, 2019, 1:43am

TVM definitely needs dependency analysis infra, we can start from this point, since isl has very powerful dependency analysis.

ehsanmok · April 3, 2019, 5:27pm

Open sourced: https://github.com/tensorflow/mlir

aca88 · April 4, 2019, 9:59am

So correct me if I am wrong at interpreting your view: MLIR would be the abstract class of IRs, and IRs at different scopes (i.e. the scope of TVM IR is at a higher level than LLVM IR) would be derived classes of it. In these derived classes (MLIR dialects) the MLIR methods are overloaded (and extended) for specifics of the derived classes. Therefore TVM IR passes are these overloaded methods and we dont have to worry that MLIR will make TVM obsolete?

A couple of things which still kind of bugs me with this interpretation are:

MLIR seems to (natively?) support descriptions targeted for polyhedral dialects. Since TVM does not use the polyhedral model, I guess we would be limited in the usage of this information but could it also be an impairment?
If MLIR is the “root class” of all other dialects, then if a parser reads the highest level input form (ex. ONNX) and translates it into MLIR, how can we ensure that the parser doesnt omit any information which we need in the TVM dialect?

tqchen · April 4, 2019, 4:40pm

Good comments. I would like to separate the answer in two parts, and this is an updated view after I take look at the MLIR’s codebase.

Interpretation of MLIR’s Vision

I think what you answered reflects MLIR’s vision. Make the abstract class of IR and derive dialects. But not necessarily provide specific pass for the dialect, so if X-IR is a dialect of MLIR, then there are dialect specific passes that is needed in the pass.

Polyhedral dialect is a dialect in MLIR. In the current case, the polyhedral IR is part of the mlir codebase, which gives the view of “native”, but non-the-less it is a dialect just like the other automatic optimization dialect. The fact that it is part of the native code base does give an opinionated view of what what automatic optimization should be like in MLIR ecosystem. I think it is still very much an open problem, TVM has done a lot in this direction, and we can collectively innovate on this area.

How TVM can work with MLIR

First of all, MLIR won’t make TVM obsolete. In the contrary, it can help TVM stack by providing insights in IR design and possibly some lowering infrastructure.The community will keep improving our current IR infrastructure toward a better unified TVM-IR infra. We will try to define TVM dialects in MLIR to see if it makes sense to allow bi-directional translation between MLIR and TVM-IR, this way we can take benefit of some of the infra provided by MLIR and make TVM work together with MLIR’s ecosystem.

Mei · April 7, 2019, 4:55pm

In my vision, there could be a vendor-neutral library that implements higher level MLIR dialect operators in lower (algebraic) level. There could be a graph optimizer, a tensor optimizer and a traditional compiler optimizer. Graph optimizer does higher level graph optimizations like fusion as well as serves as a driver. It partitions graph, inlines operators from the vendor-neutral library and directs selected partitions to the tensor optimizer. It also invokes traditional compilers for traditional global optimizations. It should also accommodate vendor-specific libraries by keeping them as intrinsics to be lowered into function/kernel calls. Tensor compiler will not see the dialects.

xqdan · April 8, 2019, 3:59pm

My take is,

MLIR is a replacement of HalideIR. 1) compiler infra support, like cfg/dfa/ssa, with these, we can avoid pattern matching style pass on Halide, which is not good for maintaining, 2) other better utilities, like text ir; 3) unified IR for multi-level, graph and tensor.

I agree the idea we have a MLIR phase in TVM. if it’s indeed better, we can move our work to MLIR gradually, or just write new optimization pass on MLIR.

tqchen · April 8, 2019, 4:56pm

Some of the good directions of MLIR like text format are already present in relay. And a natural next step would be the unification of relay and tensor level IR to bring a unified TVM IR.

Note that MLIR did not make the choice of IR design, so we still make a deliberate choice on how to design the IR. We could use more thoughts on “pattern-matching style pass” vs transformations on CFG(which we could move to a different thread). Note that both are pattern matching on different structures. In my experience, Halide’s loop nesting IR benefit from certain high-level information and quick prototyping. While CFG+SSA is good for codegen(role of LLVM).

junrushao · April 8, 2019, 5:31pm

Would love to know your opinions about customized data types, like strs, lists, dicts, etc.

Personally I feel it is rather high-level but is necessary in the long term if we want to represent complicated deep learning models.

junrushao · April 8, 2019, 5:39pm

Personally I am not quite into polyhedral optimization for now, mainly because most kernels in deep learning can get fine performance with handcrafted scheduling. For very computational intensive kernels we already have good vendor library support. Relatively, graph-level optimization is somehow more like low-hanging fruits.