TVM deals with these in the Relay IR directly. For example, the IR with NCHW16c and NCHW4c may look like:
%1 = nn.conv2d(...) // output layout: NCHW16c
%2 = layout_transform(%1, "NCHW4c") // output layout: NCHW4c
...
When compiling the above IR, layout_tranform
is just an operator like conv2d
, so %1
and %2
are individual tensors. As a result, runtime only needs to execute the compiled graph/bytecode and doesn’t have to worry about layout transform.
Weights can be done in the same way, but we usually simplify/fold the layout transform in the case of model inference which weights are already constants:
def @main(%data) {
%1 = layout_transform(%const[0], "target_layout"); // %const[0] is the weights
%2 = nn.conv2d(%data, %1);
...
}
becomes:
def @main(%data) {
%1 = nn.conv2d(%data, %const[0]); // %const[0] is the weights in target_layout.
...
}