Hello, I want to import a model from PyTorch into TVM and port it to my architecture that only supports NHWC data layout. I’m using the ConvertLayout transform for this to do so for qnn.conv2d. But the frontend inserts another layout_transform right after conv2d to again change the layout for the bias_add operation. Is there a way to get rid of this layout_transform? I expect all my data to be in NHWC format as I want to merge bias_add and conv2d into one operation
The IR looks like this:
fn (%layer1_input: Tensor[(1, 16, 16, 16), float32] /* ty=Tensor[(1, 16, 16, 16), float32] */, %c1_weight: Tensor[(16, 16, 3, 3), float32] /* ty=Tensor[(16, 16, 3, 3), float32] */, %c1_bias: Tensor[(16), float32] /* ty=Tensor[(16), float32] */) -> Tensor[(1, 16, 14, 14), uint8] {
%0 = qnn.quantize(%layer1_input, 0.00786161f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8", axis=1) /* ty=Tensor[(1, 16, 16, 16), uint8] */;
%1 = qnn.quantize(%c1_weight, meta[relay.Constant][0] /* ty=Tensor[(16), float32] */, 0 /* ty=int32 */, out_dtype="int8", axis=0) /* ty=Tensor[(16, 16, 3, 3), int8] */;
%2 = layout_transform(%0, src_layout="NCHW", dst_layout="NHWC") /* ty=Tensor[(1, 16, 16, 16), uint8] */;
%3 = layout_transform(%1, src_layout="OIHW", dst_layout="HWIO") /* ty=Tensor[(3, 3, 16, 16), int8] */;
%4 = qnn.conv2d(%2, %3, 0 /* ty=int32 */, 0 /* ty=int32 */, 0.00786161f /* ty=float32 */, meta[relay.Constant][0] /* ty=Tensor[(16), float32] */, padding=[0, 0, 0, 0], channels=16, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 14, 14, 16), int32] */;
%5 = layout_transform(%4, src_layout="NHWC", dst_layout="NCHW") /* ty=Tensor[(1, 16, 14, 14), int32] */;
%6 = qnn.quantize(%c1_bias, meta[relay.Constant][1] /* ty=Tensor[(16), float32] */, 0 /* ty=int32 */, out_dtype="int32", axis=0) /* ty=Tensor[(16), int32] */;
%7 = nn.bias_add(%5, %6) /* ty=Tensor[(1, 16, 14, 14), int32] */;
%8 = qnn.requantize(%7, meta[relay.Constant][2] /* ty=Tensor[(16), float32] */, 0 /* ty=int32 */, 0.0175786f /* ty=float32 */, 64 /* ty=int32 */, axis=1, out_dtype="int32") /* ty=Tensor[(1, 16, 14, 14), int32] */;
%9 = clip(%8, a_min=0f, a_max=255f) /* ty=Tensor[(1, 16, 14, 14), int32] */;
cast(%9, dtype="uint8") /* ty=Tensor[(1, 16, 14, 14), uint8] */
} /* ty=fn (Tensor[(1, 16, 16, 16), float32], Tensor[(16, 16, 3, 3), float32], Tensor[(16), float32]) -> Tensor[(1, 16, 14, 14), uint8] */