Task of relay in TVM stack


I have a general question about the whole design of the TVM-stack, because it’s a bit unclear for me. As far as I understood, following steps are done:

  1. The Relay framework converts a deep learning model into the Relay IR
  2. On this Relay IR some optimizations takes place (e.g. constant folding). Question: Are here already DSL optimizations like operator fusion performed or happens that later?
  3. The optimized Relay IR is transformed to the TVM IR, where all the operators are optimized (based on the hardware)
  4. TVM is compiling the optimized IR to machine code and this can then be executed on the hardware.

Did I understand everything correctly?

What makes me unsure is that I was reading this document: https://arxiv.org/pdf/1904.08368.pdf

In section 3.3 (page 6) the Relay framework is described, and it says that the Relay process consists of three steps (frontend, compiler and backend). The frontend and compiler steps sounds plausible in my opinion, but the Relay backend was described that it is responsible for producing the machine-specific code. I thought Relay itself is only responsible for creating the IR and optimizing it, but not for generating the machine specific code. I thought TVM is responsible for doing this.

Can you please enlighten me?

BR Josef

Hi @joschi2804 ,

your understanding seems to be similar to my understanding. TVM uses Relay as a graph-level IR to perform the graph-level optimizations (i.e., layer fusion, etc.).

It is easier to understand and remember if you know that Relay basically represents the layers of your model, while Tensor IR (TIR) represents the tensors and operations that are performed on them by the layers. The TIR steps is basically the backend step in the Relay paper… at least in my understanding