Keep nn.bias_add after ConvertLayout

Hi,

I am stuck on an issue I have where after I apply ConvertLayout, my nn.bias_add is decomposed into nn.expand_dims and add. I then do a FoldConstant pass to remove all the transform_layout that are laying around, which will also remove all the aforementioned nn.expand_dims.

Before:

  %0 = qnn.quantize(%serving_default_input_1:0, 0.00215048f /* ty=float32 */, -128 /* ty=int32 */, out_dtype="int8") /* ty=Tensor[(1, 24, 32, 1), int8] */;
  %1 = qnn.conv2d(%0, meta[relay.Constant][0] /* ty=Tensor[(3, 3, 1, 16), int8] */, -128 /* ty=int32 */, 0 /* ty=int32 */, 0.00215048f /* ty=float32 */, meta[relay.Constant][1] /* ty=Tensor[(16), float32] */, padding=[1, 1, 1, 1], channels=16, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 24, 32, 16), int32] */;
  %2 = nn.bias_add(%1, meta[relay.Constant][2] /* ty=Tensor[(16), int32] */, axis=3) /* ty=Tensor[(1, 24, 32, 16), int32] */;
  %3 = qnn.requantize(%2, meta[relay.Constant][3] /* ty=Tensor[(16), float32] */, 0 /* ty=int32 */, 0.0156257f /* ty=float32 */, -128 /* ty=int32 */, axis=3, out_dtype="int8") /* ty=Tensor[(1, 24, 32, 16), int8] */;
  %4 = clip(%3, a_min=-128f, a_max=127f) /* ty=Tensor[(1, 24, 32, 16), int8] */;
  %5 = nn.max_pool2d(%4, pool_size=[2, 2], strides=[2, 2], padding=[0, 0, 0, 0], layout="NHWC") /* ty=Tensor[(1, 12, 16, 16), int8] */;

After:

  %0 = qnn.quantize(%serving_default_input_1:0, 0.00215048f /* ty=float32 */, -128 /* ty=int32 */, out_dtype="int8") /* ty=Tensor[(1, 24, 32, 1), int8] */;
  %1 = qnn.conv2d(%0, meta[relay.Constant][0] /* ty=Tensor[(16, 3, 3, 1), int8] */, -128 /* ty=int32 */, 0 /* ty=int32 */, 0.00215048f /* ty=float32 */, meta[relay.Constant][1] /* ty=Tensor[(16), float32] */, padding=[1, 1, 1, 1], channels=16, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="OHWI", out_dtype="int32") /* ty=Tensor[(1, 24, 32, 16), int32] */;
  %2 = add(%1, meta[relay.Constant][2] /* ty=Tensor[(1, 1, 1, 16), int32] */) /* ty=Tensor[(1, 24, 32, 16), int32] */;
  %3 = qnn.requantize(%2, meta[relay.Constant][3] /* ty=Tensor[(16), float32] */, 0 /* ty=int32 */, 0.0156257f /* ty=float32 */, -128 /* ty=int32 */, axis=3, out_dtype="int8") /* ty=Tensor[(1, 24, 32, 16), int8] */;
  %4 = clip(%3, a_min=-128f, a_max=127f) /* ty=Tensor[(1, 24, 32, 16), int8] */;
  %5 = nn.max_pool2d(%4, pool_size=[2, 2], strides=[2, 2], padding=[0, 0, 0, 0], layout="NHWC") /* ty=Tensor[(1, 12, 16, 16), int8] */;

How do I keep the nn.bias_add?

I know there are ways to fix this up afterwards, but I would rather that this did not occur in the first place.

just disable the pass relay.transform.CanonicalizeOps()

2 Likes