I have been trying to study how TVM does layout transformation during runtime (eg. NHWC16c β NHWC4c, etc.). Where in the source code is the required data copy or move of the data tensor handled?
Also, where is the same for the weights tensor handled?
Is it in the CopyDataFromTo
function of class CPUDeviceAPI
in src/runtime/cpu_device_api.cc
?
TVM deals with these in the Relay IR directly. For example, the IR with NCHW16c and NCHW4c may look like:
%1 = nn.conv2d(...) // output layout: NCHW16c
%2 = layout_transform(%1, "NCHW4c") // output layout: NCHW4c
...
When compiling the above IR, layout_tranform
is just an operator like conv2d
, so %1
and %2
are individual tensors. As a result, runtime only needs to execute the compiled graph/bytecode and doesnβt have to worry about layout transform.
Weights can be done in the same way, but we usually simplify/fold the layout transform in the case of model inference which weights are already constants:
def @main(%data) {
%1 = layout_transform(%const[0], "target_layout"); // %const[0] is the weights
%2 = nn.conv2d(%data, %1);
...
}
becomes:
def @main(%data) {
%1 = nn.conv2d(%data, %const[0]); // %const[0] is the weights in target_layout.
...
}
1 Like