Where does layout Transform Data Copy/Move happen?

TVM deals with these in the Relay IR directly. For example, the IR with NCHW16c and NCHW4c may look like:

%1 = nn.conv2d(...) // output layout: NCHW16c
%2 = layout_transform(%1, "NCHW4c") // output layout: NCHW4c
...

When compiling the above IR, layout_tranform is just an operator like conv2d, so %1 and %2 are individual tensors. As a result, runtime only needs to execute the compiled graph/bytecode and doesn’t have to worry about layout transform.

Weights can be done in the same way, but we usually simplify/fold the layout transform in the case of model inference which weights are already constants:

def @main(%data) {
  %1 = layout_transform(%const[0], "target_layout"); // %const[0] is the weights
  %2 = nn.conv2d(%data, %1);
  ...
}

becomes:

def @main(%data) {
  %1 = nn.conv2d(%data, %const[0]); // %const[0] is the weights in target_layout.
  ...
}
1 Like