According to this tutorial, if we aim at converting models to 8 bit, we can convert framework-prequantized Model (with my quantization information) to tvm. However, framework like PyTorch does not support quantization bit lower than 8.
Another way might be converting float model to tvm first and use
quantize_relay_module to quantize the float relay model. However, this way use tvm’s quantization algorithms (e.g. KL) to calculate quantization scales and zero points which may lead to larger accuracy drop.
So are there any ways to send quantization scales and zero points when quantize models to bits lower than 8?