Dear community, lately i’ve played around with QAT on the PyTorch level. My model was a custom CNN/MLP model for image classification, containing only the following layers:
- Conv2D
- MaxPool2D
- Linear
- Dropout (for training only obv.)
- QuantStub/Dequantstub
Without quantization the performance was around 92%. Using quantization-aware-training (following PyTorch’s guide: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html#quantization-aware-training) the performance was still on a high level (around 91.6%) in PyTorch. Now following TVM’s guide for deploying a prequantized model (https://tvm.apache.org/docs/tutorials/frontend/deploy_prequantized.html). the measured accuracy on the TVM level (for the whole test-set for my QAT-model) dropped down to 60%. I was training on PyTorch 1.5.1, deploying on x86, quantizing through QNN on the TVM level (which i could see in the Relay IR).
Any suggestions how this could happen?
Trying to dive further into the accuracy drop off i observed a strange behaviour for TVM’s accuracy:
In the first column you see the amount of epochs that i trained the model. The next two columns represent the measured accuracy of the QAT-trained model on the PyTorch and TVM level. The last column represents the %-delta between the two accuracy values. Apparently the accuracy on the TVM level decreases the longer i train my model (while accuracy on the PyTorch level obviously increases). Another interesting thing i observed was the accuracy differences when freezing quantization parameters earlier in the training process: the earlier i freeze the quantization parameters, the higher the accuracy is in TVM (freezing after 1 epoch: 77% accuracy, freezing after 8 epochs (from 20): 60%).
Im sorry if i missed any necessary information, its my first blog post on this forum! I will happily add any requested information or even scripts to reproduce.
Do you have any idea how…
- The drop-off from around 90% (in PyTorch) to 60% (in TVM) can be explained?
- The decreasing TVM accuracy together with the later freezing of the quantization parameters can be explained?
Best regards, Knight3