Model size comparison before/after tuning/compiling

After researching the processes that occur during tuning and compiling a model with TVM, my intuition says that an optimized model should be smaller than an un-optimized model. However, this doesn’t appear to be the case. After tuning and compiling a ResNet50 onnx model for an llvm target, the resulting .tar package is slightly larger (103MB) than the original .onnx model (102MB). The reason I expected the tuned model to be smaller is due to high-level graph optimizations like operator fusion, dead code elimination, constant propagation, etc. that occur during tuning. Am I assuming this incorrectly, or does this behavior just vary across a variety of models?

It seems not the case to me. The graph-level optimization does reduce the graph size, but the majority of the tarball size comes from the weights (parameters), not the model graph. Besides, the purpose of graph optimization and tuning aims to better performance instead of a smaller tarball.

Okay, makes sense. So TVM does not perform optimizations like pruning and quantization?

TVM doesn’t do model pruning or compression, as it usually requires re-training to guarantee the accuracy. For quantization, you can check out the following materials

https://tvm.apache.org/docs/tutorials/frontend/deploy_prequantized.html#sphx-glr-tutorials-frontend-deploy-prequantized-py

1 Like