Hi, @comaniac . I looked into your example and did a simple experiment similar to it.

My example network imported into relay as below:

```
#[version = "0.0.5"]
def @main(%input.1: Tensor[(1, 1, 32, 16), float32], %conv.0.bias: Tensor[(1), float32], %conv.0.weight: Tensor[(1, 1, 3, 3), float32], %fc.0.weight: Tensor[(30, 14), float32]) {
%0 = reshape(%input.1, newshape=[1, 1, -1, 16]);
%1 = nn.conv2d(%0, %conv.0.weight, padding=[0, 0, 0, 0], kernel_size=[3, 3]);
%2 = nn.bias_add(%1, %conv.0.bias);
%3 = nn.relu(%2);
%4 = reshape(%3, newshape=[-1, 14]);
%5 = transpose(%fc.0.weight, axes=[1, 0]);
%6 = transpose(%5, axes=[1, 0]);
%7 = nn.dense(%4, %6, units=None);
nn.relu(%7)
}
```

By applying the kernel layout conversion pass as below:

```
desired_layouts = {'nn.dense': ['NHWC', 'OHWI'],
'nn.conv2d': ['NCHW', 'OHWI']}
seq = tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts),
relay.transform.FoldConstant()])
with tvm.transform.PassContext(opt_level=3):
mod = seq(mod)
```

The outcome is as below:

```
#[version = "0.0.5"]
def @main(%input.1: Tensor[(1, 1, 32, 16), float32], %conv.0.bias: Tensor[(1), float32], %conv.0.weight: Tensor[(1, 1, 3, 3), float32], %fc.0.weight: Tensor[(30, 14), float32]) -> Tensor[(30, 30), float32] {
%0 = reshape(%input.1, newshape=[1, 1, -1, 16]) /* ty=Tensor[(1, 1, 32, 16), float32] */;
%1 = layout_transform(%conv.0.weight, src_layout="OIHW", dst_layout="OHWI") /* ty=Tensor[(1, 3, 3, 1), float32] */;
%2 = nn.conv2d(%0, %1, padding=[0, 0, 0, 0], kernel_size=[3, 3], kernel_layout="OHWI") /* ty=Tensor[(1, 1, 30, 14), float32] */;
%3 = expand_dims(%conv.0.bias, axis=1, num_newaxis=2) /* ty=Tensor[(1, 1, 1), float32] */;
%4 = add(%2, %3) /* ty=Tensor[(1, 1, 30, 14), float32] */;
%5 = nn.relu(%4) /* ty=Tensor[(1, 1, 30, 14), float32] */;
%6 = reshape(%5, newshape=[-1, 14]) /* ty=Tensor[(30, 14), float32] */;
%7 = transpose(%fc.0.weight, axes=[1, 0]) /* ty=Tensor[(14, 30), float32] */;
%8 = transpose(%7, axes=[1, 0]) /* ty=Tensor[(30, 14), float32] */;
%9 = nn.dense(%6, %8, units=None) /* ty=Tensor[(30, 30), float32] */;
nn.relu(%9) /* ty=Tensor[(30, 30), float32] */
}
```

The kernel layout fed into nn.conv2d is changed accordingly successfully, but there’s no change for nn.dense.

Questions might be dumb: what shall I add in relay to allow the nn.dense kernel layout change for the relay pass dedicated for layout conversion?

I see there are different conv2d implemenations with different layout formats but there’s only one for nn.dense, which is not with the desired kernel layout I’m expecting. Since I’m using BYOC, according to what you’ve descirbed above, it seems those strategy related implementation doesn’t affect me anyways. So where and what shall I change to allow nn.dense kernel layout change? Thank you.