Thanks for the quick response! Here’s the my core problem: I am trying to avoid writing a ton of patterns for the same basic computation.
As an example, I am using the HuggingFace Transformer exported to ONNX. The start of the transformer, from the ONNX perspective, looks like
MatMul -> Add. However, after importing to TVM, the ONNX frontend does a bunch of data mutation. This is to account for broadcasting and the fact that TVM does matrix multiplication as
(m,k) x (n,k), where ONNX does matrix multiplication as
(m,k) x (k,n). This means that the Relay expression becomes
Reshape -> Reshape -> Transpose -> MatMul -> Reshape -> Add.
Different frontends may handle the reshape / transposes differently, but they will both do the core computation of
MatMul -> Add. To avoid writing a ton of patterns, I would like an option to always consider these reshape and transpose operators as a match and just skip over them. In this case, the core
MatMul -> Add pattern will always match, and any operators in between will be merged into the composite function.
A custom transformer implementation, such as Nvidia’s FasterTransformer, won’t care about these reshapes anyway.