I’m working on int8 calibration and have some observations on the choice of the precision of bias.

I implemented per-layer activation scale (similar to this pr https://github.com/dmlc/tvm/pull/2753) and used a simple calibration method by taking power2 scale of maximum abs value of each layer’s output on a small calibration dataset. This approach works well (imagenet abs top-1 accuracy drop ~1%)on some models resnet-50, vgg-16.

However, on resnet-101 the accuracy loss is larger. I tried to quantize bias term (rhs constant of add) using 16 bits (32 bits will make lhs overflow after left shift) and obtained good accuracy. Specifically, in add_rewrite I marked the rhs constant as QAnnotateKind.BIAS and modified these lines https://github.com/dmlc/tvm/blob/31ba01399e6f9f0b4146afee685f16c5ddc68f91/python/tvm/relay/quantize/quantize.py#L243 as:

```
if kind == QAnnotateKind.BIAS:
const_params[ndom_scale] = _make_const(scale / (2**15))
const_params[nclip_min] = _make_const(- ((2**31) - 1))
const_params[nclip_max] = _make_const(((2**31) - 1))
```

My patch will change the scale selected in AddRealize. Previously scale of lhs is selected in almost all cases because lhs scale is from conv2d, which is multiplication of input and weight scale (smaller than rhs), so bias will be shifted left in this case.

After my patch, bias scale will be selected and conv2d result will be shifted left.

Left-shift in either cases doesn’t lead to overflow and should not cause precision loss. But it is possible that bias are shared per-channel so more bits help.

I would like to discuss the observation here and see how we can improve the quantization accuracy.

Some of my experiment numbers:

resnet101 top1/top5 accuracy on Imagenet (first 3k images)

- power-of-2 scale weight & activation + 8bit bias (current TVM) 0.6876/0.8887
- power-of-2 scale weight & activation + 9bit bias 0.7633/0.9276
- power-of-2 scale weight & activation + 16bit bias: 0.7697/0.9353
- float32 0.7777/0.9387