I’m going to use pytorch frontend to parse a pytorch model and quantize the model. However, It’s not clear for me, how should I set the pytorch quantization API to get same arithmetic results as Relay.
For example, if I set the QConfig API as below, and convert the model to a quantized one, will Relay output same results as the quantized pytorch model?
my_qconfig = QConfig(activation=FakeQuantize.with_args(observer=MovingAverageMinMaxObserver,
quant_min=0,
quant_max=255,
reduce_range=True),
weight=FakeQuantize.with_args(observer=MovingAveragePerChannelMinMaxObserver,
quant_min=0,
quant_max=255,
dtype=torch.quint8,
qscheme=torch.per_channel_symmetric,
reduce_range=False,
ch_axis=0))