Hi all,
This post summarizes a series of problems, we’ve experienced, of aligning result of TVM to Torch’s.
Note:
- The fbgemm backend of Torch is effectually an “int 7” quantization
- Thus, the qnnpack backend is chosen – i.e. try to align result of TVM on
arm_cpu
to Torch qnnpack’s
Problem 1:Problems introduced by the “two rounding” behavior on arm_cpu
op: q_multiply_shift
, used in requantization.
The DEFAULT path:
The NEON path:
The NEON path may produce some different values (due to two rounding)[1], compared with the DEFAULT path, within a single layer.
The problem is, on arm_cpu
, it will sometimes use the DEAULT path, sometimes use the NEON path:
- If the innermost axis, is a multiple of four and vectorization applied, the NEON path is enabled
- Otherwise, the DEFAULT path
BTW, it looks like “two rounding” is important to make a “bit exact result” of TFLite qnnpack, see
Problem 2: Problems introduced by different rounding algorithms
Case 1: AdaptiveAvgPool2d, result truncated due to integer division, see Pytorch: The inference results of tvm and pytorch are inconsistent
It seems can be fixed by adding a “0.5”, following an integer division. What make things complicated is, Torch applies banker’s rounding, that is, Rounding to Nearest Even(RNE):
- RNE(1.5) → 2
- RNE(2.5) → 2
Thus, a bigger problem is, Torch uses RNE as its standard rounding algorithm, e.g.
- Quantization Aware Training: The fake quantize op
- Quantization (Inference only): qnnpack backend (tested by @wyc0926 )
Now, even the DEAULT path of q_multiply_shift
can not produce the same result with Torch’s.