Operator fusion confusion

I’m having some trouble understanding operator fusion related concepts. Things are even more confusing after attempting to write TOPI definitions for a new backend (trying to override just dense schedule for now), as the implementation there is not what I expected.

  1. Does operator fusion simply combine multiple Relay IR nodes into one node?

At this point my understanding is we would be left with a new operator “fused_dense_add_relu” or similar which we will use later on in TOPI.

  1. What is the difference between operator fusion and inlining in TOPI schedules? I read somewhere on the forum that there is no concept of fused schedules, what are the implications of this?

Looking at op.name during traverse_inline never prints the name of a single fused op as I’d expect from the first question. I see it printing the order of dense -> T_add -> T_relu when scheduling from Relay, scheduling from TOPI will not print “relu” however and just says “elemwise”.

  1. How does Relay know which operators it can fuse for a target backend? What about for heterogeneous execution?

  2. There is currently no method of tensorizing composition ops (isn’t this operator fusion?). This means I can’t pattern match for inner loops with gemm/gemv + (bias + activation). Using compute_at doesn’t merge produce blocks created by dense and (bias+activation). Inlining merges the bias+activation operations as I would expect (i.e. turns them into a single produce block). The work around, as I understand, is to have an initial traverse_inline pass which just detects (bias exists + activation type). This info is then passed into the tensorization intrinsic where I can later use it during CodeGen. The tensorization intrinsic just matches with the high level gemm or gemv op during a second traverse_inline pass and all other elemwise or broadcast ops are ignored. What is the recommended way of scheduling fused ops when hardware doesn’t run the fused ops in a sequence?

2 Likes

For 3, fuses start with conv2d or conv2d_transpose, and they just try to fuse the element wise operations after them. An “annotation.stop_fusion” is the end mark of a fuse, which is always added after “cast” in most cases I met.

Is there anywhere to configure which ops can be fused for an accelerator? Or is the only option to create Relay IR passes which do this?

1 Like

hi years past, have you figured out how this works? I also want to know how the relay op fusion works with compute and schedule