Operator fusion confusion

I’m having some trouble understanding operator fusion related concepts. Things are even more confusing after attempting to write TOPI definitions for a new backend (trying to override just dense schedule for now), as the implementation there is not what I expected.

  1. Does operator fusion simply combine multiple Relay IR nodes into one node?

At this point my understanding is we would be left with a new operator “fused_dense_add_relu” or similar which we will use later on in TOPI.

  1. What is the difference between operator fusion and inlining in TOPI schedules? I read somewhere on the forum that there is no concept of fused schedules, what are the implications of this?

Looking at op.name during traverse_inline never prints the name of a single fused op as I’d expect from the first question. I see it printing the order of dense -> T_add -> T_relu when scheduling from Relay, scheduling from TOPI will not print “relu” however and just says “elemwise”.

  1. How does Relay know which operators it can fuse for a target backend? What about for heterogeneous execution?

  2. There is currently no method of tensorizing composition ops (isn’t this operator fusion?). This means I can’t pattern match for inner loops with gemm/gemv + (bias + activation). Using compute_at doesn’t merge produce blocks created by dense and (bias+activation). Inlining merges the bias+activation operations as I would expect (i.e. turns them into a single produce block). The work around, as I understand, is to have an initial traverse_inline pass which just detects (bias exists + activation type). This info is then passed into the tensorization intrinsic where I can later use it during CodeGen. The tensorization intrinsic just matches with the high level gemm or gemv op during a second traverse_inline pass and all other elemwise or broadcast ops are ignored. What is the recommended way of scheduling fused ops when hardware doesn’t run the fused ops in a sequence?


For 3, fuses start with conv2d or conv2d_transpose, and they just try to fuse the element wise operations after them. An “annotation.stop_fusion” is the end mark of a fuse, which is always added after “cast” in most cases I met.

Is there anywhere to configure which ops can be fused for an accelerator? Or is the only option to create Relay IR passes which do this?

hi years past, have you figured out how this works? I also want to know how the relay op fusion works with compute and schedule