Problem with FuseOps (and embedded constants in TIR)

Long story short, after https://github.com/apache/tvm/pull/8509 we see a lot of floating point models crash during compilation for Hexagon. On Hexagon (downstream) we use the mixed-precision pass to convert float32 data to float16. Float16 constants are not supported by constants in TIR and compilation aborts.

Longer story…

The relay pass FuseOps is the one that can extract constants out of expressions, and replace them with parameters. Constants that are not parameters cannot have type float16, because that type is not supported by current TIR code (for embedded constants). If this extraction doesn’t happen, compilation will abort if a float16 constant is found in TIR.

After commit b5f1dabce4 (PR8509), FuseOps will no longer extract constants if the build target has link_params set to true. This can be, at least in theory, overridden by pass context flag relay.FuseOps.link_params, but there is an issue.

The problem is as follows:

  • The FoldConstant pass runs, and it creates a fresh PassContext, without any config flags in it. It also uses the “cpu” target instead of the actual one for CompilationConfig.
  • During execution, FoldConstant will invoke relay interpreter’s function Eval, which initiates a series of relay passes, including FuseOps.
  • FuseOps gets link_params value from IRModule's attribute Executor, which is still consistent with the actual target. If the original target has link_params = 1, it will take effect regardless of any flags added to PassContext at the relay.Build time.

It seems like FuseOps should be getting settings consistent with the CPU target that FoldConstants created, instead of using Executor from IRModule. This difference may have further consequences if passes executed during FoldConstants consult the executor for more information.

Any thoughts?

1 Like

Agree with your last sentence – FoldConstants should be CPU only and not carry forward any target-specific flags. (Ideally do all that more directly instead of piggy-backing on the interpreter, but that’s a bigger issue.)

I wonder why TIR constants doesn’t support fp16? Because of the need for c-codegen? @manupa-arm

I’m not sure. But I guess it is because C++ doesn’t have a native fp16 type support?

Hi @kparzysz ,

Sorry to hear that there was downstream failure because of #8509.

I am also wondering how this is true, because of the following (note that we just used what was supported in TVM via LinkedParam node) :

The above are called respectively in the following locations :

We need to fix this if it is not happenning. @kparzysz would you be able to file an issue ? attn : @dmitriy-arm @Mousius

I think we need to move the link-params to be a property of the TIR backend (i.e. target).

However, the above seems to be the workaround of the issue, the real issue I suspect is hexagon backend expect LinkedParams node if --link-params is used, however, there is not a unit test (a simple codegen one) to assert this – that should have been broken by #8509.

We were hitting this assertion:

We do have a unit testcase that checks the LLVM IR to see if the _linked_param function was generated, but I’m not sure if it runs in CI right now, since it requires Hexagon target to be enabled. This is actually work in progress, so these tests should start running soon.

@kparzysz .

As mentioned in the PR, the above reference is about scalar constants, that is not subject to link-params. (Correct me if I am wrong – @dmitriy-arm ).

#8509 is about non-scalar constants.

One option is to hexagon backend needs to be adjusted to handle AllocateConst nodes, instead of LinkedParams node.

Lets continue the discussion on the PR (#8509) to avoid duplicate conversation. Do you agree @kparzysz ?

Sure, I replied in the PR.