Does TVM have any built-in or automated support for tensor packing transformations?
I’m referring to optimizations similar to those described in the 8. Packed Convolution — Dive into Deep Learning Compiler 0.1 documentation, where the data layout is changed (e.g., from NCHW to a packed format like NCHW{x}c) to improve cache locality and SIMD utilization on CPUs.
I’d like to know:
- Can this kind of tensor packing transformation be applied automatically via MetaSchedule or other TVM auto-tuning/IR passes?
- Or, is this kind of packing generally done manually through scheduling primitives and layout transforms?