Hi, first of all, hope everyone stays healthy in this pandemic.
I have a question regarding operator fusion.
Until today, I’ve assumed fusion happens after AutoTVM’s operator tuning process since fusion decisions seem backend-dependent (for example, optimized low-level library may provide fused operator) and fusion process can consider the optimal performance of each operator.
Plus, when extracting tunable tasks in AutoTVM, I’ve never seen any fused operator yet.
However, I realized the documentation says other way.
Based on my understanding, relay pipeline ends before AutoTVM, so it sounds like fusion occurs before the operator tuning.
Yet, it feels more natural to me to conduct the operation fusion latter stage of TVM pipelines due to the reason I stated earlier.
Can anyone clarify at which stage of TVM pipeline performs fusion?
If fusion actually happens before AutoTVM, what’s the reason behind the design?
And can AutoTVM also auto-tune this fused operator in this case?
Op fusion happens later in the pipeline. AutoTVM extracts tuning tasks from a graph before fusion, so it only looks at individual op (conv, dense etc).
Our fusion rule is not hardware dependent (for now). Both CPU and GPU backend get the same fused operator. We only fuse cheap ops into convolution, dense etc, with the assumption that a tuned convolution schedule is also optimal if it is fused with other cheap ops. That allows autotvm tuning and fusion be done independently.
I should clarify that autotvm tuning is assumed to be done ahead of time. So when we run relay.build(...), autotvm will not run. TVM will look up tuned parameters (stored in a file) during codegen.
Maybe a good way to put it is to say that op fusion (FuseOps) is a graph level transformation. But actual codegen happens node by node, where each node corresponds to a fused convolution for example. So “later” may not be a good word. After graph level transformation we do a lot of “tensor level” optimization.