Where does layout Transform Data Copy/Move happen?

I have been trying to study how TVM does layout transformation during runtime (eg. NHWC16c β†’ NHWC4c, etc.). Where in the source code is the required data copy or move of the data tensor handled? Also, where is the same for the weights tensor handled?

Is it in the CopyDataFromTo function of class CPUDeviceAPI in src/runtime/cpu_device_api.cc?

TVM deals with these in the Relay IR directly. For example, the IR with NCHW16c and NCHW4c may look like:

%1 = nn.conv2d(...) // output layout: NCHW16c
%2 = layout_transform(%1, "NCHW4c") // output layout: NCHW4c
...

When compiling the above IR, layout_tranform is just an operator like conv2d, so %1 and %2 are individual tensors. As a result, runtime only needs to execute the compiled graph/bytecode and doesn’t have to worry about layout transform.

Weights can be done in the same way, but we usually simplify/fold the layout transform in the case of model inference which weights are already constants:

def @main(%data) {
  %1 = layout_transform(%const[0], "target_layout"); // %const[0] is the weights
  %2 = nn.conv2d(%data, %1);
  ...
}

becomes:

def @main(%data) {
  %1 = nn.conv2d(%data, %const[0]); // %const[0] is the weights in target_layout.
  ...
}
1 Like