[BYOC] multi-layer subgraphs

max1996 · September 28, 2020, 2:33pm

Hello,

I’ve got a new question about the BYOC flow. In my current implementation “annotation.stop_fusion” instructions are added to the partitioned relay description of the network, usually separating individual network nodes.

Is there a way to disable this behaviour, to enable passing multi-layer subgraphs to the custom backend?

aca88 · September 28, 2020, 5:42pm

Could you be more specific? maybe by posting a snippet?

max1996 · September 29, 2020, 7:39am

def @main(%input_1: Tensor[(1, 224, 224, 3), float32]) -> Tensor[(1, 1000), float32] {
  %69 = nn.pad(%input_1, pad_width=[[0, 0], [0, 1], [0, 1], [0, 0]]) /* ty=Tensor[(1, 225, 225, 3), float32] */;
  %70 = multiply(%69, 16f /* ty=float32 */) /* ty=Tensor[(1, 225, 225, 3), float32] */;
  %71 = round(%70) /* ty=Tensor[(1, 225, 225, 3), float32] */;
  %72 = clip(%71, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 225, 225, 3), float32] */;
  %73 = cast(%72, dtype="int8") /* ty=Tensor[(1, 225, 225, 3), int8] */;
  %74 = @tinyai_0(%73) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %75 = add(%74, 64 /* ty=int32 */) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %76 = right_shift(%75, 7 /* ty=int32 */) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %77 = clip(%76, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %78 = multiply(%77, meta[relay.Constant][9] /* ty=Tensor[(32), int32] */ /* ty=Tensor[(32), int32] */) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %79 = add(%78, meta[relay.Constant][10] /* ty=Tensor[(32), int32] */ /* ty=Tensor[(32), int32] */) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %80 = clip(%79, a_min=0f, a_max=192f) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %81 = @tinyai_2(%80) /* ty=Tensor[(1, 112, 112, 32), int8] */;
  %82 = annotation.stop_fusion(%81) /* ty=Tensor[(1, 112, 112, 32), int8] */;
  %83 = @tinyai_3(%82) /* ty=Tensor[(1, 112, 112, 32), int32] */;
  %84 = add(%83, 2 /* ty=int32 */) /* ty=Tensor[(1, 112, 112, 32), int32] */;

For my BYOC backend, I am using TVMs Quantization, followed by composite pattern matching, annotation, merging and partitioning. In the above relay description, which represents part of the partitioned MobileNetV1 (converted from Keras Model), are a couple of annotation.stop_fusion statements, that separate subgraphs of my custom backend (as in line %81 to %83).

It seems, like they are introduced in the annotation step and usually separate the individual layers of the model.

aca88 · September 29, 2020, 8:07am

So you are saying that the %82 = annotation.stop_fusion(%81) is preventing you to composite merge @tinyai_2 and @tinyai_3?

Do you first annotate and then composite merge?

max1996 · September 29, 2020, 9:58am

first composite, afterwards annotate.

aca88 · September 29, 2020, 1:30pm

Aren’t they introduced by the quantization phase?

You could add the annotation.stop_fusion to the pattern and deal with it there.

mbaret · September 30, 2020, 9:00am

I’m not particularly familiar with annotation.stop_fusion, other than that it seems like something that is introduced to block the FuseOps pass. My naive solution here would be to register annotation.stop_fusion as a supported operator for your codegen. You can then later remove/ignore it.