Creating store_at in TVM

Halide provides the scheduling primitive store_at (as well as store_root) to move where the storage of a tensor happens independent of the compute. This is very useful for when we want to make use of the sliding window optimization and create rolling buffers - both of which can be critical in reducing memory usage on memory constrained devices. For examples of how this is used in Halide, you can reference the tutorial on multi-stage pipelines:

Is this something we can emulate in TVM with the existing scheduling intrinsics? And if not, is this something the design of TE would permit? In the latter case, Iā€™d be interested to know whether it would currently be worth implementing in TE given the change of approach in TensorIR, or whether it would be better to wait.


cc @junrushao1994 @tqchen @manupa-arm

1 Like

Yes definitely useful to have! might save a lot of hacks/workarounds that would otherwise needed to get the same functionality. Also cc : @spectrometerHBH @merrymercy