hey all — i am trying to understand how TVM optimizes for data layout. here’s my current understanding:
- certain hardware prefers certain layouts for certain operators (e.g. NCHWc for conv2d on x86).
- TVM can optimize around operators that have “hard” layout constraints (like conv2d) and try to conform the rest of the operators to those layouts, to try and minimize layout changes. i’ve at least seen Relay passes for this; i’m unsure if it also happens at lower layers.
i have some questions:
- is optimizing data layout a solved problem? if not, where are the current pain points?
- is it common that you’re trying to conform layouts to fit the expectations of the existing kernels, because writing new kernels with new layouts is a pain? or are people willing to explore new layouts, if they are potentially beneficial?
I ask as I’ve recently been experimenting with a small language which is built to search over the space of layout transformations. It was originally designed for converting neural networks into high-level accelerator designs, but I’m wondering if it might also be useful for exploring how we can optimize data layouts within or between kernel invocations!