Long story short, after https://github.com/apache/tvm/pull/8509 we see a lot of floating point models crash during compilation for Hexagon. On Hexagon (downstream) we use the mixed-precision pass to convert float32 data to float16. Float16 constants are not supported by constants in TIR and compilation aborts.
Longer story…
The relay pass FuseOps is the one that can extract constants out of expressions, and replace them with parameters. Constants that are not parameters cannot have type float16, because that type is not supported by current TIR code (for embedded constants). If this extraction doesn’t happen, compilation will abort if a float16 constant is found in TIR.
After commit b5f1dabce4 (PR8509), FuseOps will no longer extract constants if the build target has link_params set to true. This can be, at least in theory, overridden by pass context flag relay.FuseOps.link_params, but there is an issue.
The problem is as follows:
The FoldConstant pass runs, and it creates a fresh PassContext, without any config flags in it. It also uses the “cpu” target instead of the actual one for CompilationConfig.
During execution, FoldConstant will invoke relay interpreter’s function Eval, which initiates a series of relay passes, including FuseOps.
FuseOps gets link_params value from IRModule's attribute Executor, which is still consistent with the actual target. If the original target has link_params = 1, it will take effect regardless of any flags added to PassContext at the relay.Build time.
It seems like FuseOps should be getting settings consistent with the CPU target that FoldConstants created, instead of using Executor from IRModule. This difference may have further consequences if passes executed during FoldConstants consult the executor for more information.
Agree with your last sentence – FoldConstants should be CPU only and not carry forward any target-specific flags. (Ideally do all that more directly instead of piggy-backing on the interpreter, but that’s a bigger issue.)
Sorry to hear that there was downstream failure because of #8509.
I am also wondering how this is true, because of the following (note that we just used what was supported in TVM via LinkedParam node) :
The above are called respectively in the following locations :
We need to fix this if it is not happenning. @kparzysz would you be able to file an issue ?
attn : @dmitriy-arm@Mousius
I think we need to move the link-params to be a property of the TIR backend (i.e. target).
However, the above seems to be the workaround of the issue, the real issue I suspect is hexagon backend expect LinkedParams node if --link-params is used, however, there is not a unit test (a simple codegen one) to assert this – that should have been broken by #8509.
We do have a unit testcase that checks the LLVM IR to see if the _linked_param function was generated, but I’m not sure if it runs in CI right now, since it requires Hexagon target to be enabled. This is actually work in progress, so these tests should start running soon.