[QUANTIZATION][PYTORCH] Suitable pytorch api setting for relay quantization

I’m going to use pytorch frontend to parse a pytorch model and quantize the model. However, It’s not clear for me, how should I set the pytorch quantization API to get same arithmetic results as Relay.

For example, if I set the QConfig API as below, and convert the model to a quantized one, will Relay output same results as the quantized pytorch model?

my_qconfig = QConfig(activation=FakeQuantize.with_args(observer=MovingAverageMinMaxObserver,
                                                            quant_min=0,
                                                            quant_max=255,
                                                            reduce_range=True),
                     weight=FakeQuantize.with_args(observer=MovingAveragePerChannelMinMaxObserver,
                                                               quant_min=0,
                                                               quant_max=255,
                                                               dtype=torch.quint8,
                                                               qscheme=torch.per_channel_symmetric,
                                                               reduce_range=False,
                                                               ch_axis=0))

I’m not sure what you are asking. Whatever qconfig you quantize your Torch model with, the converted Relay model is equivalent to the quantized Torch model.

But due to the difference in numerics, the raw floating point output between quantized torch models and converted Relay models can be slightly different. That’s why there are difference in accuracy shown in https://github.com/apache/incubator-tvm/pull/4977.

FYI this is the qconfig I’m using.

1 Like

Thanks a lot for your reply!