Dynamic Ops in Relay

mbrookhart · June 8, 2020, 9:57pm

Frameworks and state of the art models are moving more and more toward dynamism, where the shapes of tensors in a model are calculated at runtime, either from the shapes of inputs or from the values of inputs.

There are a number of efforts underway in TVM to better support dynamic models, including the Tensorflow importer and the Relay VM. In order to align the various frontends, we’d like to find a unified approach to dynamism in the core relay ops.

Two possible approaches are A0, which involves merging dynamic and static ops, and A1, which separates them:

A0. Demonstrated by @lixiaoquan’s work on Symbolic Reshape (https://github.com/apache/incubator-tvm/pull/5429), one approach is to make certain attributes of ops Optional, increase the argument count, and add logic to a number of passes to either use the static attribute, if it is defined, or to use the new dynamic input.

A1. Another approach would be to introduce a new dynamic namespace with a set of dynamic versions of ops, separate the two versions in passes, and eventually transition the dynamic ops to be default.

This RFC seeks to spawn discussion on the various advantages and disadvantages of these two approaches.

As a starting point, we see: A0.

Pros.

Operators of the same semantics are used in one place, which avoids potential fragmentation

Cons.

Definition/Understanding/Optimization of those operators are potentially more complicated, there might be some passes need to re-work to respect potential dynamism

A1.

Pros.

We have a clear boundary between dynamic and static ops.

Passes are easier to reason about

Cons.

Operators can be fragmented as time goes

More changes to APIs

Either approach with involve changes to frontend APIs and the Relay IR. To limit the impact to runtimes, we’d like to propose to features around dynamic shapes:

A compile time check to ensure we only run fully static models with the graph runtime. This will help prevent opaque memory allocation errors in the Graph Runtime
A pass that can convert dynamic ops to static ops via a mixture of rules to replace certain outputs with constants and constant folding. Many models that use dynamic ops may actually be static, such as a model that calculates the Shape of a statically-shaped tensor, then uses that calculated shape to run a dynamic reshape. This pass would allow dynamic importers, like ONNX and TF, to simply export dynamic graphs while getting the performance benefits of static relay models with the Graph Runtime

Performance and Optimization is an important consideration for dynamic shapes, but is mostly outside the scope of this RFC. Most kernel tuning and compilation methods we have in TVM assume static input shapes. As we move forward with more and more dynamic operations and models, the question of how we generate efficient code for multiple input shapes will become more pressing, so thoughts on that are appreciated.

@tqchen @jroesch @jwfromm @yongwww @haichen @kevinthesun

kevinthesun · June 9, 2020, 1:24am

@mbrookhart Thank you for this RFC. IMHO, one advantage of A0 is it is more user friendly to just have a unified API to handle both static and dynamic shape cases. Though this adds complexity of type inference of each op, it reduces 1) number of relay ops. 2) complexity of frontend logic to handle static/dynamic shapes. With unified op, most frontend ops can be converted elegantly by just calling relay op, instead of extra shape inference.

tqchen · June 9, 2020, 2:05am

Both A0 and A1 should be able to reduce the complexity of the frontend logic, as conversion can always goes to their dynamic variants, and then follows the conversion promotes the dynamic variants to the static counterpart.

From the interface design PoV. A0 somewhat creates additional duplication piece(as the attribute and arguments can provide the same set of info).

We should also note the difference of the ops in terms of the following flow, A0 and A1 would have different implications(in terms of the implementation) in the flow below:

Dynamic input -> dyn-to-static -> some static aware pass.

Considering backward compatibility of the ops, given that some of the ops are already written with the static analysis part in mind, starting with A1 convention then adding dynamic variants would keep most of the current pass backward compatible.

kevinthesun · June 9, 2020, 6:38am

Correct me if my understanding is wrong, is the goal of A1 to finally merge static and dynamic ops into a single dynamic API which input tensors allows dynamic input and attributes only allows constants(Like TensorFlow)? Also in terms of the boundary of static and dynamic ops, we still need to work with Any when input data shape is dynamic even if all the attributes are constant. It’s a bit unclear to me whether this case will fall into static(attributes) or dynamic side.

lixiaoquan · June 9, 2020, 10:33am

Any thought about dynamic rank support?

tqchen · June 9, 2020, 3:30pm

I think main topic of interest here is the way we define function signatures, not necessarily how to broaden the scope of the same function to support more flexible inputs(e.g. Any)

I think the main goal of A1 concerns the semantics of the attribute. Since attribute has always been considered as constant and can be used during shape inference.

The main difference between A0 and A1 are not about how do we categorize dynamic ops vs static ops, all the operator still will need to support the type inference Any, as you suggested. It only has to do with the way we define operators.

For example, in the context of the reshape operator, A1 means we will have two variants of operators:

A1:
- V0: relay.static.reshape(x, attr={shape: (1,2,3)})
- V1: relay.dyn.reshape(x, shape)
- relay.reshape alias to V0 for now for backward compact, once the dyn -> static pass completes, relay.reshape alias to relay.dyn.reshape.
A0: For a clear comparison this is the V2 variants.
- V2: relay.reshape(x, shape, optional_attr={shape: (1,2,3)})

Both V0 and V1 still need to be able to handle the case of Any in the input(x), in the case of V1, it also need to handle Any in terms of the shape. In the same case (A0) V2 also need to handle Any in all of its input.

There is a tension between frontend and the backends. For frontends, it is easier to use operator signatures that are more “dynamic”. For backend, however, we want more information in the system, which favors more static information as part of the function signature. The type inference can certainly take benefit of the attribute information of the operator. Because attribute is constant and directly available during type inference. So besides introducing V1 style operator def, it is important to have a pass to be able to convert dynamic variants of operators aka V1 into V0 when possible, so that even though the input graph has a dynamic semantics, we will still be able to use static information in the case of graph runtime and embedded devices.

The definition of V2 is trying to combine V0, V1 s a polymorphic function into a single function signature. In terms of implementations, it means that during the optimizations passes, a pass would have to deal with all possible combinations of V2, (e.g. shape_attr exists, shape exists, both exists). It is harder for pass writer to understands the semantics of the operator, as the same piece of information is available in two possible places.

The advantage of A1 is the separation of concern. The frontend can always use V1 to import the model. Of course A1 does not reduce the amount of effort, since most effort done in V2 would need to be split into two natural part of functions to support V0 and V1. It does makes the dyn to static conversion pass easier, so that output of the pass contains as many ops in V0 style as possible, and optimization passes that targets these static variants of ops can be used to provide better optimization chances for the backend.

kevinthesun · June 9, 2020, 7:08pm

Make sense. For API user, we will provide more dynamic API to support any input cases. In the backend, we separate the purely static case(Probably requires no shape func?) and dynamic cases to make it easier to maintain related passed.

mbrookhart · June 9, 2020, 11:25pm

@kevinthesun @tqchen Are you guys agreeing to A1?

@lixiaoquan I haven’t put much thought into dynamic rank. I’m not sure how we would produce it with the current opset, do you have any use cases in mind?

lixiaoquan · June 10, 2020, 11:37am

Now I think dynamic rank support should be a separated issue. Maybe we can discuss that in another thread.

tqchen · June 10, 2020, 7:09pm

Seems we have converged on A1 with the additional clarifications in this thread

jwfromm · June 10, 2020, 9:40pm

Just wanted to add a +1 to A1 as it seems like the best way to gradually move TVM towards dynamism without breaking things in the meantime.

haichen · June 11, 2020, 8:55pm

I’m also in favor of A1 approach. I have one more question to dynamic ops. Currently Relay allows to use symbolic var to represent a dimension. In the world of A1, if attributes contains a symbolic var, such as new shape in reshape, are we treating the op as a dynamic op or static op?

tqchen · June 11, 2020, 9:00pm

To keep things simple, we can disallow symbolic var in attributes and force attributes to be constant, so if the var is symbolic dependent, we should use the dyn variant.

t-vi · June 13, 2020, 1:48pm

Just a quick shout regarding potential difficulties: I think this difficulty in common subexpression elimination with reshape is a consequence of the A0 approach for reshape:

Best regards

Thomas

tqchen · June 13, 2020, 4:33pm

Indeed A1 can address the problem better:

In the proposal, there is a dyn to static pass, this pass will try to convert constants to attributes as much as possible. After this pass, all of the constant shape reshape will become static, and then we can apply CSE easily. Of course, we can also directly run a constant tying pass before the dyn reshape then run CSE

mbrookhart · June 16, 2020, 7:02pm