I’m having some trouble understanding operator fusion related concepts. Things are even more confusing after attempting to write TOPI definitions for a new backend (trying to override just dense schedule for now), as the implementation there is not what I expected.
- Does operator fusion simply combine multiple Relay IR nodes into one node?
At this point my understanding is we would be left with a new operator “fused_dense_add_relu” or similar which we will use later on in TOPI.
- What is the difference between operator fusion and inlining in TOPI schedules? I read somewhere on the forum that there is no concept of fused schedules, what are the implications of this?
Looking at op.name during traverse_inline never prints the name of a single fused op as I’d expect from the first question. I see it printing the order of dense -> T_add -> T_relu when scheduling from Relay, scheduling from TOPI will not print “relu” however and just says “elemwise”.
-
How does Relay know which operators it can fuse for a target backend? What about for heterogeneous execution?
-
There is currently no method of tensorizing composition ops (isn’t this operator fusion?). This means I can’t pattern match for inner loops with gemm/gemv + (bias + activation). Using compute_at doesn’t merge produce blocks created by dense and (bias+activation). Inlining merges the bias+activation operations as I would expect (i.e. turns them into a single produce block). The work around, as I understand, is to have an initial traverse_inline pass which just detects (bias exists + activation type). This info is then passed into the tensorization intrinsic where I can later use it during CodeGen. The tensorization intrinsic just matches with the high level gemm or gemv op during a second traverse_inline pass and all other elemwise or broadcast ops are ignored. What is the recommended way of scheduling fused ops when hardware doesn’t run the fused ops in a sequence?