Frameworks and state of the art models are moving more and more toward dynamism, where the shapes of tensors in a model are calculated at runtime, either from the shapes of inputs or from the values of inputs.
There are a number of efforts underway in TVM to better support dynamic models, including the Tensorflow importer and the Relay VM. In order to align the various frontends, we’d like to find a unified approach to dynamism in the core relay ops.
Two possible approaches are A0, which involves merging dynamic and static ops, and A1, which separates them:
A0. Demonstrated by @lixiaoquan’s work on Symbolic Reshape (https://github.com/apache/incubator-tvm/pull/5429), one approach is to make certain attributes of ops Optional, increase the argument count, and add logic to a number of passes to either use the static attribute, if it is defined, or to use the new dynamic input.
A1. Another approach would be to introduce a new dynamic namespace with a set of dynamic versions of ops, separate the two versions in passes, and eventually transition the dynamic ops to be default.
This RFC seeks to spawn discussion on the various advantages and disadvantages of these two approaches.
As a starting point, we see: A0.
Pros.
Operators of the same semantics are used in one place, which avoids potential fragmentation
Cons.
Definition/Understanding/Optimization of those operators are potentially more complicated, there might be some passes need to re-work to respect potential dynamism
A1.
Pros.
We have a clear boundary between dynamic and static ops.
Passes are easier to reason about
Cons.
Operators can be fragmented as time goes
More changes to APIs
Either approach with involve changes to frontend APIs and the Relay IR. To limit the impact to runtimes, we’d like to propose to features around dynamic shapes:
-
A compile time check to ensure we only run fully static models with the graph runtime. This will help prevent opaque memory allocation errors in the Graph Runtime
-
A pass that can convert dynamic ops to static ops via a mixture of rules to replace certain outputs with constants and constant folding. Many models that use dynamic ops may actually be static, such as a model that calculates the Shape of a statically-shaped tensor, then uses that calculated shape to run a dynamic reshape. This pass would allow dynamic importers, like ONNX and TF, to simply export dynamic graphs while getting the performance benefits of static relay models with the Graph Runtime
Performance and Optimization is an important consideration for dynamic shapes, but is mostly outside the scope of this RFC. Most kernel tuning and compilation methods we have in TVM assume static input shapes. As we move forward with more and more dynamic operations and models, the question of how we generate efficient code for multiple input shapes will become more pressing, so thoughts on that are appreciated.