I wanted to know what’s the reason/interpretation behind the NCHWc shape or ordering. I understand what NCHW means for a network processing 2D images, but why are we using NCHWc in TVM? and how is a tensor of shape NCHW converted to NCHWc?
(This comes at trying to implement dilation in x86’s conv2d)
NCHWc is one example of using data layout to improve spatial locality for a specific hardware target. In this case it is convenient to have channel as the innermost dimension as it is often a power of two and larger than the vector width x86 CPUs (both AVX-2 and AVX-512). Note that this is hardware target dependent, so not every hardware target will use NCHWc, as not every hardware target will benefit from this layout.
Not quite - if you consider an example where C=512, and that your CPU has a 64-wide 8bit vector instructions, you can reshape your data layout to perform 64-wide vector operations. As eqy explained, this would result in [N][C][H][W] -> [N][C/c][H][W][c] where c=64 and C=512. For brevity we express it as NCHWc. This requires a data layout transform (4D array re-shaped to 5D), and to process the data (e.g. matrix multiply) it requires loop splitting and reordering which are faciliated by TVM’s schedule transformations.
For CPU backend, I wonder what is the benefit of using NCHWc over NHWC? I think one reason is, for Winograd convolution, using NCHWc can help accelerate the input and output transformation. However, I wonder whether there is some other cases that NCHWc can outperform NHWC? Thanks!
Hi everyone, I am also having a look at NCHWc. I noticed that the kernel for this configuration goes from 4D to 6D!, do you guys know why it is 6D instead of 5D? @thierry@eqy
I really appreciate any info you can provide on this issue.