Regarding Operator Fusion

sunggg · December 14, 2020, 4:32am

Hi, first of all, hope everyone stays healthy in this pandemic.

I have a question regarding operator fusion.

Until today, I’ve assumed fusion happens after AutoTVM’s operator tuning process since fusion decisions seem backend-dependent (for example, optimized low-level library may provide fused operator) and fusion process can consider the optimal performance of each operator. Plus, when extracting tunable tasks in AutoTVM, I’ve never seen any fused operator yet.

However, I realized the documentation says other way.

Near the end of the relay optimization pipeline, we will run a pass(FuseOps) to break the end-to-end function(e.g. MobileNet) into sub-function(e.g. conv2d-relu) segments https://tvm.apache.org/docs/dev/index.html#transformations

Based on my understanding, relay pipeline ends before AutoTVM, so it sounds like fusion occurs before the operator tuning.

Yet, it feels more natural to me to conduct the operation fusion latter stage of TVM pipelines due to the reason I stated earlier.

Can anyone clarify at which stage of TVM pipeline performs fusion? If fusion actually happens before AutoTVM, what’s the reason behind the design? And can AutoTVM also auto-tune this fused operator in this case?

Thank you for your help in advance!

masahi · December 14, 2020, 4:52am

Op fusion happens later in the pipeline. AutoTVM extracts tuning tasks from a graph before fusion, so it only looks at individual op (conv, dense etc).

Our fusion rule is not hardware dependent (for now). Both CPU and GPU backend get the same fused operator. We only fuse cheap ops into convolution, dense etc, with the assumption that a tuned convolution schedule is also optimal if it is fused with other cheap ops. That allows autotvm tuning and fusion be done independently.

sunggg · December 14, 2020, 5:22am

Appreciate your reply! This clarifies a lot.

You said op fusion happens later in the pipeline. Can you clarify it a little further?

Do you mean the end of relay pipeline (Before AutoTVM, not far later like target translation or runtime module) as the documentation says?

I’m assuming a FuseOps pass is the one performing operator fusion.

masahi · December 14, 2020, 5:33am

I should clarify that autotvm tuning is assumed to be done ahead of time. So when we run relay.build(...), autotvm will not run. TVM will look up tuned parameters (stored in a file) during codegen.

Maybe a good way to put it is to say that op fusion (FuseOps) is a graph level transformation. But actual codegen happens node by node, where each node corresponds to a fused convolution for example. So “later” may not be a good word. After graph level transformation we do a lot of “tensor level” optimization.

sunggg · December 14, 2020, 5:58pm

Gotcha. This makes a lot of sense. Appreciate your help