I have a general question about the whole design of the TVM-stack, because it’s a bit unclear for me. As far as I understood, following steps are done:
- The Relay framework converts a deep learning model into the Relay IR
- On this Relay IR some optimizations takes place (e.g. constant folding). Question: Are here already DSL optimizations like operator fusion performed or happens that later?
- The optimized Relay IR is transformed to the TVM IR, where all the operators are optimized (based on the hardware)
- TVM is compiling the optimized IR to machine code and this can then be executed on the hardware.
Did I understand everything correctly?
What makes me unsure is that I was reading this document: https://arxiv.org/pdf/1904.08368.pdf
In section 3.3 (page 6) the Relay framework is described, and it says that the Relay process consists of three steps (frontend, compiler and backend). The frontend and compiler steps sounds plausible in my opinion, but the Relay backend was described that it is responsible for producing the machine-specific code. I thought Relay itself is only responsible for creating the IR and optimizing it, but not for generating the machine specific code. I thought TVM is responsible for doing this.
Can you please enlighten me?