My TF model has 120MB. TVM params for it are around 320MB and when I run it on device it uses 969MB of heap.
Is there any way (except disabling winograd) to reduce the memory footprint?
I have the custom inference engine which runs exactly the same model (using nnpack for convolutions) which needs only around 320MB (but is 2 times slower)