I’m implementing a DenseNet and there’s an optimization I’d like to make, I’m curious if it’s expressible in TVM. Naively implementing a DenseNet requires a large number of concatenations, which consume a large amount of memory and bandwidth. The output of each conv2d layer of a DenseNet is concatenated with its inputs and fed to the next layer.
If we look at a single pixel, the operations are as follows:
RGB . kernel1 -> M
RGBM . kernel2 -> N
RGBMN . kernel3 -> O
RGBMNO . kernel4 -> … and so on…
Ideally, I’d like to set up my convolution layers so that they write into slices of the target buffer, which I can then use as-is for the next convolution.
Is there some way to implement this correctly?