Been playing around with the CMSIS-NN implementation, and I noticed that it was taking up more flash than TFLM. I found that the reason is the unused variables that stem from the quantization of conv2d, namely the input scale and the weight/filter scale. While they are originally part of the conv2d call, they are not being used and are “replaced” by a shift and a multiplier respectively when creating the function call (see https://github.com/apache/tvm/pull/11431 @ashutosh-arm), but their data buffers still remain.
Here is an image showing some of the unused buffers that I am experiencing (marked with red):
How would one go about removing these?