Is Dynamic Shape Tensor Program Compiled into Multiple Binary Artifact?

zhangstevenunity · February 18, 2025, 11:19pm

In the recent TVM paper “Relax: Composable Abstractions for End-to-End Dynamic Machine Learning” (Section 5.1), the authors describe LLM support through the Relax IR. A notable claim states: “Importally, Relax compiles models only once for arbitrary batch sizes and sequence lengths.” This is further elaborated with: “More importantly, cross-level abstractions enable us to use compiler-optimized matrix-vector multiplication tensor programs at batch size 1, while being able to apply partial library lowering to leverage operator libraries for other batch sizes.”

Does this implementation imply that:

During compilation, a single dynamic-shaped tensor program is lowered into multiple binary artifacts optimized for different shape parameters
At runtime, the virtual machine employs a dispatch mechanism to dynamically select the appropriate precompiled binary based on concrete input shapes?