TIR-level CSE enabeld by default

We’ve just merged a PR https://github.com/apache/tvm/pull/9482 that enables the TIR-level common expression elimination pass by default.

Since the IR a backend compiler (LLVM, NVCC, SPIRV drivers) would see becomes very different, it might cause disruption in your existing workflow. If you see performance regression later, for example, you can easily disable the new pass by adding disabled_passes=["tir.CommonSubexprElimTIR"], as in:

 with tvm.transform.PassContext(opt_level=3, disabled_pass=["tir.CommonSubexprElimTIR"]):
    func = tvm.build(...)  # or relay.build(...) 

Personally, I don’t expect perf regression to happen for C-source based backends (CUDA etc). I’d watch out for perf regression in LLVM, but more interesting possibilities is perf improvement - Since in general an optimization at a higher-level IR can take advantage of more information, we might be able to CSE more expressions than LLVM does. Anyway, please let us know what you see.

We are also interested to hear what good use cases you might have for TIR-level CSE. The ones I’m aware of are CSE across host and device, and CSE for “lean” NPU compilers (the ones that do not rely on traditional compiler infrastructures) etc.

4 Likes

CC @spectrometerHBH who observed some issues previously with CUDA codegen. It might be of your interest