Hi all,

This post summarizes a series of problems, we’ve experienced, of aligning result of TVM to Torch’s.

Note:

- The fbgemm backend of Torch is effectually an “int 7” quantization
- Thus, the qnnpack backend is chosen – i.e. try to align result of TVM on
`arm_cpu`

to Torch qnnpack’s

#
Problem 1：Problems introduced by the “two rounding” behavior on `arm_cpu`

op: `q_multiply_shift`

, used in requantization.

The DEFAULT path:

The NEON path:

The NEON path may produce some different values (due to two rounding)[1], compared with the *DEFAULT path*, within a single layer.

The problem is, on `arm_cpu`

, it will sometimes use the DEAULT path, sometimes use the NEON path:

- If the innermost axis, is a multiple of four and vectorization applied, the NEON path is enabled
- Otherwise, the DEFAULT path

BTW, it looks like “two rounding” is important to make a “bit exact result” of TFLite qnnpack, see

# Problem 2: Problems introduced by different rounding algorithms

Case 1: AdaptiveAvgPool2d, result truncated due to integer division, see Pytorch: The inference results of tvm and pytorch are inconsistent

It seems can be fixed by adding a “0.5”, following an integer division. What make things complicated is, Torch applies banker’s rounding, that is, Rounding to Nearest Even(RNE):

- RNE(1.5) → 2
- RNE(2.5) → 2

Thus, a bigger problem is, Torch uses RNE as its standard rounding algorithm, e.g.

- Quantization Aware Training: The fake quantize op
- Quantization (Inference only): qnnpack backend (tested by @wyc0926 )

Now, even the DEAULT path of `q_multiply_shift`

can not produce the same result with Torch’s.