Reuse buffer in concatenation

I’m implementing a DenseNet and there’s an optimization I’d like to make, I’m curious if it’s expressible in TVM. Naively implementing a DenseNet requires a large number of concatenations, which consume a large amount of memory and bandwidth. The output of each conv2d layer of a DenseNet is concatenated with its inputs and fed to the next layer.

If we look at a single pixel, the operations are as follows:

RGB . kernel1 -> M
RGBM . kernel2 -> N
RGBMN . kernel3 -> O
RGBMNO . kernel4 -> … and so on…

Ideally, I’d like to set up my convolution layers so that they write into slices of the target buffer, which I can then use as-is for the next convolution.

Is there some way to implement this correctly?

no, that’s not possible. But I think NNVM already does some kind of in-place op detection. You can read the contents of nnvm/src/pass/plan_memory.cc and see if it suits your need.

One way to solve the problem is to just build kernel as you normally do, when you pass in output array structure pass in slice of a big array.