[Pytorch] The inference results of tvm and pytorch are inconsistent


I created a pytorch quantization model. After compiling with tvm, I did inference. The result was inconsistent with pytorch. The strange thing is that this phenomenon occurs sometimes.

my code:

import torch
from torch import nn
from torch.quantization import QuantStub, DeQuantStub, get_default_qat_qconfig, convert, prepare_qat
from tvm import relay
import numpy as np
import tvm
from tvm import relay
from tvm.contrib import graph_executor

class AdaptiveAvgPool2d(nn.Module):
    def __init__(self):
        self.quant = QuantStub()
        self.dequant = DeQuantStub()
        self.pool = nn.AdaptiveAvgPool2d((1, 1))

    def forward(self, x):
        x = self.quant(x)
        y = self.pool(x)
        y = self.dequant(y)
        return y

    def fuse_model(self):

fp32_input = torch.randn(1, 3, 128, 128)
model = AdaptiveAvgPool2d()

BACKEND = "qnnpack"
model.qconfig = get_default_qat_qconfig(BACKEND)

prepare_qat(model, inplace=True)

y = model(fp32_input)
model_int8 = convert(model, inplace=True)

script_module = torch.jit.trace(model, fp32_input).eval()

input_name = "input"
input_infos = [(input_name, ((1, 3, 128, 128), 'float32'))]
img_input = np.random.rand(1, 3, 128, 128).astype(np.float32)

pt_input = torch.from_numpy(img_input)

torch.backends.quantized.engine = 'qnnpack'

with torch.no_grad():
    pt_result = script_module(pt_input)

mod, params = relay.frontend.from_pytorch(script_module, input_infos)

target = "llvm"
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target=target, params=params)
module = graph_executor.GraphModule(lib["default"](tvm.cpu(0)))
module.set_input(input_name, img_input)
print("compare result: ", np.allclose(pt_result[0].numpy().flatten(), module.get_output(0).asnumpy().flatten(), atol=1e-05))

If you run the above code repeatedly, you will find that the comparison result is sometimes true and sometimes false. Why is this?

compare result: True

[0.48794654 0.48794654 0.48794654]
[0.48794654 0.48794654 0.48794654]
compare result:  True

compare result: False

[0.5066974 0.5066974 0.5066974]
[0.47291756 0.47291756 0.47291756]
compare result:  False

I analyzed the output results. When the results are inconsistent, the inference results of tvm are always different from the results of pytorch by a value of scale size, so I suspect that pytorch’s adaptive_avg_pool2d will round the results, while tvm directly discards the decimal part, When rounding is encountered and when a carry is required, the result of tvm will be smaller than pytorch by a value of scale size.

@masahi What’s your opinion on this issue?

I think in order to ensure the accuracy of the model, rounding is necessary.

diff --git a/include/tvm/topi/nn/pooling.h b/include/tvm/topi/nn/pooling.h
index c81c7cda7..467d2f5d8 100644
--- a/include/tvm/topi/nn/pooling.h
+++ b/include/tvm/topi/nn/pooling.h
@@ -386,7 +386,7 @@ inline Tensor adaptive_pool_impl(const Tensor& x, const Array<PrimExpr>& output_
             divide_factor *= tvm::cast(x->dtype, reduce_axes[i]->dom->extent);
-          return div(pool_sum(indices), divide_factor);
+          return div(pool_sum(indices) + div(divide_factor, 2), divide_factor);
         "tensor", kElementWise);
   } else {

Interesting, does PyTorch do something like that? It’s not obvious to me if we can do this without concern. Would this change make the output of every adaptive avg pool different? What about normal avg pooling?

I realize that this modification is really wrong, no rounding is required for floating point input, only when the input is a quantized tensor, the necessary rounding can guarantee the model accuracy.

And pytorch uses round to even mode, not ordinary rounding.

Does it actually hurt model accuracy on some dataset?

Yes, due to confidentiality reasons I cannot publish the model, and the output needs to be consistent with pytorch at the pixel level.

I split the complete quantization network. The backbone is a cnn structure. The output results are consistent. When encountering the adaptive avg pool operator, the output results will be very different from pytorch.

Is there something we can do from our frontend? Rather than changing topi code?

I’m surprised to hear that it changes the output to differ from pytorch so much… it looks like minor difference.

I don’t think this problem can be solved in the frontend. The following is my fix. At present, the inference results can be kept consistent with pytorch at the pixel level.

diff --git a/include/tvm/topi/nn/pooling.h b/include/tvm/topi/nn/pooling.h
index c81c7cda7..8e8db0e2a 100644
--- a/include/tvm/topi/nn/pooling.h
+++ b/include/tvm/topi/nn/pooling.h
@@ -374,7 +374,35 @@ inline Tensor adaptive_pool_impl(const Tensor& x, const Array<PrimExpr>& output_
         "tensor", "adaptive_pool_sum");
-    return tvm::te::compute(
+    if (x->dtype.code() == DataType::kInt || x->dtype.code() == DataType::kUInt) {
+      return tvm::te::compute(
+        out_shape,
+        [&](const Array<Var>& output) {
+          Array<PrimExpr> indices;
+          Array<tir::IterVar> reduce_axes;
+          std::tie(indices, reduce_axes) = get_iter_vars(output, false);
+          PrimExpr divide_factor = tvm::cast(x->dtype, 1);
+          for (size_t i = 0; i < n_dim; ++i) {
+            divide_factor *= tvm::cast(x->dtype, reduce_axes[i]->dom->extent);
+          }
+          PrimExpr _pool_sum = pool_sum(indices);
+          PrimExpr remainder = floormod(_pool_sum, divide_factor);
+          PrimExpr up_rounder = floordiv(divide_factor, 2);
+          PrimExpr parity = floormod(floordiv(_pool_sum, divide_factor), 2);
+          _pool_sum = tvm::if_then_else(tir::EQ(remainder, up_rounder),
+                            tvm::if_then_else(
+                              tir::EQ(parity, make_const(DataType::Int(32), 0)),
+                              _pool_sum,
+                              _pool_sum + up_rounder
+                            ),
+                            _pool_sum + up_rounder);
+          return floordiv(_pool_sum, divide_factor);
+        },
+        "tensor", kElementWise);
+    } else {
+      return tvm::te::compute(
         [&](const Array<Var>& output) {
           Array<PrimExpr> indices;
@@ -389,6 +417,7 @@ inline Tensor adaptive_pool_impl(const Tensor& x, const Array<PrimExpr>& output_
           return div(pool_sum(indices), divide_factor);
         "tensor", kElementWise);
+    }
   } else {
     LOG(ERROR) << "Unrecognized pool_type: " << pool_type;
     return x;

Hi masahi,

@DzAvril has submitted a patch, which tried to solve the problem within frontend.

([Pytorch] Results of tvm and pytorch are inconsistent by DzAvril · Pull Request #11176 · apache/tvm · GitHub), which adds a pair of dequantize & quantize around _op.nn.adaptive_avg_pool<N>d

It actually treats _op.nn.adaptive_avg_pool<N>d as a FLOAT-ONLY-Op.

Which may not be a preferred way?

IMHO, The quantized VERSION or PATH of _op.nn.adaptive_avg_pool<N>d should take care of rounding, instead of simply truncating.


To align the result of pytorch, how about adding an extra option “rounding mode” to _op.nn.adaptive_avg_pool<N>d?

  • Implement various rounding mode in adaptive_pool_impl()
  • Specify “RNE” rounding mode in pytorch frontend