Relax IR, fields that require SeqExpr

Lunderberg · April 9, 2024, 2:06pm

The Relax IR requires the FunctionNode::body, IfNode::true_branch, and IfNode::false_branch to be instances of relax::SeqExpr. If these Relax requirements are violated, correctly-implemented transformations may raise an exception (e.g. from Downcast in Downcast<SeqExpr>(func->body)->blocks), or even segfault (e.g. when .as returns a nullptr in func->body.as<SeqExprNode>()->blocks). Debugging these failure modes is also difficult, as even the TVMScript printer relies on the body of the function being a SeqExprNode. As such, I wanted to experiment with ways to improve the developer experience around these requirements.

I have a draft PR open at [QoL][Relax] Use SeqExpr in IR types when SeqExpr is required by Lunderberg · Pull Request #16859 · apache/tvm · GitHub that would update the C++ type of FunctionNode::body, IfNode::true_branch, and IfNode::false_branch to be relax::SeqExpr instead of relax::Expr. This does not impact any well-formed Relax IR, and allows this type of ill-formed Relax IR type to be checked at compile-time. A large number of checks applied during TVM runtime can now be removed, as they duplicate the new compile-time check.

To maintain backwards compatibility, this PR would also add a new constructor to relax::SeqExpr, which accepts a single Expr body argument. This constructor provides either an additional reference to the same underlying relax::SeqExprNode, if body already contains a relax::SeqExprNode, and otherwise wraps the body in a relax::SeqExpr. For implementations that previously produced well-formed Relax IR, this change has no effect. For implementations that previously produced ill-formed Relax IR, this change results in the equivalent well-formed Relax IR.

Alternate implementations considered:

Perform the backwards-compatibility wrapping within the relax::Function and relax::If constructors. While this would provide the intended conversion when these constructors are used, Relax transforms make frequent use of copy-on-write (e.g. func.CopyOnWrite()->body = new_body), which does not use the constructor. Maintaining backwards compatibility for this usage requires the implicit conversion constructor that was chosen for this PR.
Remove the Relax IR requirement for these expressions to be SeqExpr. While this would make Relax more internally consistent, such a change would break backwards compatibility that relies on SeqExpr being present. While the callsites within TVM could be updated to resolve this breakage, callsites outside of TVM (e.g. MLC-LLM) could not. Exposing the special case within the C++ type, as done in this PR, maintains backwards compatibility.

Since this does impact the Relax IR, the PR is currently marked as a draft, to allow time for discussion. I think this is a useful change to make, as it moves the well-formed check from TVM’s execution time to the C++ compile-time.

sanirudh · April 10, 2024, 3:17pm

Thanks for the detailed explanation. Just wanted ask some basic questions (might be naive ones, so please bear with me).

I think given the requirements for SeqExpr nodes, this is a really helpful change.

For IfNodes, AFAIK, dataflow blocks cannot contain IfNodes, but, can the true and false branches contain separate dataflow blocks nested within a binding block? Is that the reason for this requirement. This question is to understand why SeqExpr is needed since it contains Array<BindingBlock> and Expr as attributes.

If that’s allowed, then I think keeping the requirement of SeqExpr for the branch body makes sense and this change also makes perfect sense, but if not, just for the IfNode could we remove that restriction. I understand the backwards compatibility issue, but considering that IfNodes are not that common in models(might be wrong about this) and that relax is still in early stages and not yet officially released, it might be good to simplify some of these requirements?

Lunderberg · April 16, 2024, 8:24pm

For IfNodes, AFAIK, dataflow blocks cannot contain IfNodes, but, can the true and false branches contain separate dataflow blocks nested within a binding block?

The branches of the relax::If node can contain either DataflowBlock or non-dataflow BindingBlock, yes. This is useful when the two branches of a relax::If contain different implementations (e.g. an optimized version with static shapes in one branch, or a generic version with dynamic shapes in the other). The restriction is that DataflowBlock cannot contain relax::If, but relax::If can still contain a DataflowBlock.

Is that the reason for this requirement.

My understanding is that the requirement is intended to make it simpler to write transformation passes, by restricting the form of the Relax IR.

I understand the backwards compatibility issue, but considering that IfNodes are not that common in models(might be wrong about this) and that relax is still in early stages and not yet officially released, it might be good to simplify some of these requirements?

I’m generally all for simplifying the requirements, and my personal preference would be the remove the requirement of a SeqExpr from both relax::If and relax::Function. However, since Relax has been merged to TVM main, I’d lean toward the backwards compatible option.

sanirudh · April 17, 2024, 8:58am

Thanks a lot for the reply. I understand the reasoning for backwards compatibility because unity is merged into main. I think in this case, it makes a lot of sense.

Just as a general thought, maybe we should, as a community, try and define rules for when breaking backward compatibility is okay so that in the long term, we don’t have to keep maintaining legacy decisions and adjust a little easily to newer requirements in the field. The move from relay to relax has taken a lot of effort for everyone, so keeping the possibility of breaking backwards compatibility (when absolutely needed and based on possibly predefined conditions) could possibly help avoid having to do a similar move again in the future.

On a unrelated note: The use case related to using the branches of relax IfNode to implement different optimized versions of a model is really interesting, I’ve never thought about it, so thanks for that.

Lunderberg · April 18, 2024, 3:29pm

Just as a general thought, maybe we should, as a community, try and define rules for when breaking backward compatibility is okay so that in the long term, we don’t have to keep maintaining legacy decisions and adjust a little easily to newer requirements in the field.

I agree, and am thinking that would be a good topic for the next community meeting. Forbidding any breakage in backwards compatibility removes the opportunity for incremental improvement. The only way to make resolve limitations in an initial design is to make an entirely new design (e.g. Relax → Relax), which is a painful transition of a different kind.

On a unrelated note: The use case related to using the branches of relax IfNode to implement different optimized versions of a model is really interesting, I’ve never thought about it, so thanks for that.

Thank you! It’s something that I’d like to incorporate into optimization pipelines more generally. I have a PR (link) that would introduce a relax.transform.CheckForSpecialCase, which would generate static versions of a dynamic relax function, and provide the appropriate IfNode in order to delegate to those static versions. (Though, it’s been on the back-burner for a while, and I still need to resolve a few CI issues with it.)

Lunderberg · April 19, 2024, 2:30pm

Thank you for the discussion, and I am now marking [QoL][Relax] Use SeqExpr in IR types when SeqExpr is required by Lunderberg · Pull Request #16859 · apache/tvm · GitHub as ready for review. This PR implements the changes discussed here. The backwards-compatibility handling worked smoothly, with breakage only occurring in cases that expected ill-formed Relax IR. (For example, the relax::FunctionPattern only matching against ill-formed functions whose body was not a SeqExpr.)