[ONNX] Relay type checker error when calling relay.frontend.from_onnx on a quantized model

SMaksimovic-SiMa · January 17, 2022, 5:47pm

Hello,

Trying to put an ONNX model through relay.frontend.from_onnx to get an IRModule results in the following error:

The Relay type checker is unable to show the following types match.
In particular dimension 0 conflicts: 1 does not match 96.
The Relay type checker is unable to show the following types match.
In particular `Tensor[(96), float32]` does not match `Tensor[(1), float32]`

This error was encountered with the quantized ONNX CaffeNet-int8 model that can be found here: https://github.com/onnx/models/tree/master/vision/classification/caffenet

import onnx
from tvm import relay

model_path = "/path/to/caffenet-12-int8.onnx"
onnx_model = onnx.load(model_path)
shape_dict = {'data_0': (1, 3, 224, 224)}

mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)

ss-2022-01-17T18:02:03

The error seems to be emitted when calling infer_shape from here:

github.com

apache/tvm/blob/main/python/tvm/relay/frontend/onnx.py#L387


@classmethod
def _impl_v1(cls, inputs, attr, params):
    attr_cvt, data = cls._run_calculation(inputs, attr, params)
    return attr_cvt([data], attr, params)


@classmethod
def _run_calculation(cls, inputs, attr, params):
    """Helper method to return the processed input data and AttrCvt object"""


    data = inputs[0]
    input_shape = infer_shape(data)
    input_dtype = infer_type(data).checked_type.dtype
    ndim = len(input_shape)
    if "auto_pad" in attr:
        attr["auto_pad"] = attr["auto_pad"].decode("utf-8")
        if attr["auto_pad"] in ("SAME_UPPER", "SAME_LOWER"):
            if cls.name == "avg_pool":
                pad_tuple = []
                for axis in range(len(input_shape) - 2):
                    axis_shape = input_shape[2 + axis]
                    stride = attr.get("strides", [1] * ndim)[axis]

The input to the pooling operation is the output of the QLinearConv which can be seen from the image above and the same error can be generated if infer_shape is called with the output of _qnn.op.requantize as its argument here:

github.com

apache/tvm/blob/main/python/tvm/relay/frontend/onnx.py#L3740


use_bias = len(inputs) == 9
if use_bias:
    out = _op.nn.bias_add(out, inputs[8])


out_dtype = infer_type(inputs[7]).checked_type.dtype
requantize_scale = _op.multiply(x_scale, w_scale)


# requantize requires y_scale to be constant,
# if y_scale is not constant, doing dequantize -> quantize
if isinstance(y_scale, _expr.Constant):
    out = _qnn.op.requantize(
        out,
        requantize_scale,
        _op.const(0, dtype="int32"),
        y_scale,
        y_zero_point,
        out_dtype=out_dtype,
        axis=0,
    )
else:
    out = _qnn.op.dequantize(out, requantize_scale, _op.const(0, dtype="int32"), axis=0)

The reference to Tensor[(96), float32] is most likely pointing to w_scale and the subsequently generated requantize_scale since their TensorType is of the value TensorType([96], float32).

masahi · January 18, 2022, 7:39pm

cc @AndrewZhaoLuo

You are also welcome to open a github issue.

AndrewZhaoLuo · January 24, 2022, 7:29pm

I’ve made a github issue: https://github.com/apache/tvm/issues/10046

Yeah it’s probably an issue with handling per-channel quantization correctly. I’ll have time later in the week or early next week to take a look at the problem.

masahi · January 24, 2022, 8:12pm

There is also an existing issue on per channel Q in other onnx op https://github.com/apache/tvm/issues/9908

Probably better to test on end to end Q models with per-channel Q, I’ve never seen such tests for quantized onnx model (only unit tests from onnx which doesn’t have good coverage).

SMaksimovic-SiMa · January 25, 2022, 12:16am

Thank you for taking a look at this.

What should have initially caught my eye was the axis=0 parameter to the requantize operator outlined in the first post which effectively means that it assumes the first axis to be the channel axis, since axis is The channel axis for quantization. as per the docstring.

The following may be of some help:

I’ve tried to set axis=1 (though the channel axis should be somehow queried), since ONNX assumes channel-first layout to be the default, for the requantize mentioned above in addition to all of the quantize/dequantize/requantize calls in the QuantizeLinear, DequantizeLinear and QLinearConv and tested with that.

With the above, trying to parse the models listed below that can be obtained from the ONNX model zoo where the aforementioned CaffeNet model was taken from results in the following:

AlexNet - FAILED

The Relay type checker is unable to show the following types match.
In particular dimension 0 conflicts: 12288 does not match 256.
The Relay type checker is unable to show the following types match.
In particular `Tensor[(256), float32]` does not match `Tensor[(12288), float32]`

ResNet - PASSED
GoogleNet - PASSED
SqueezeNet - PASSED
ZFNet-512 - PASSED
ShuffleNet - PASSED

CaffeNet - FAILED

The Relay type checker is unable to show the following types match.
In particular dimension 0 conflicts: 12288 does not match 256.
The Relay type checker is unable to show the following types match.
In particular `Tensor[(256), float32]` does not match `Tensor[(12288), float32]`

VGG - PASSED

Otherwise all of the models mentioned produce an error similar to the one mentioned above.

EDIT: Adding direct reference to the quantize/dequantize/requantize calls changed in code:

github.com

apache/tvm/blob/main/python/tvm/relay/frontend/onnx.py#L3739-L3751


if isinstance(y_scale, _expr.Constant):
    out = _qnn.op.requantize(
        out,
        requantize_scale,
        _op.const(0, dtype="int32"),
        y_scale,
        y_zero_point,
        out_dtype=out_dtype,
        axis=0,
    )
else:
    out = _qnn.op.dequantize(out, requantize_scale, _op.const(0, dtype="int32"), axis=0)
    out = _qnn.op.quantize(out, y_scale, y_zero_point, axis=0, out_dtype=out_dtype)

github.com

apache/tvm/blob/main/python/tvm/relay/frontend/onnx.py#L3603-L3636


class QuantizeLinear(OnnxOpConverter):
    """Operator converter for QuantizeLinear."""


    @classmethod
    def _impl_v10(cls, inputs, attr, params):
        data, scale, zp = inputs
        out_dtype = infer_type(zp).checked_type.dtype
        return _qnn.op.quantize(data, scale, _op.cast(zp, "int32"), 0, out_dtype)


    @classmethod
    def _impl_v13(cls, inputs, attr, params):
        data, scale, zp = inputs
        out_dtype = infer_type(zp).checked_type.dtype
        axis = attr.get("axis", 1)
        if len(infer_shape(data)) < 2:
            axis = 0
        return _qnn.op.quantize(data, scale, _op.cast(zp, "int32"), axis, out_dtype)




class DequantizeLinear(OnnxOpConverter):

This file has been truncated. show original

AndrewZhaoLuo · January 31, 2022, 11:33pm

The second failure appears to be something related to grouped convolutions in qnn, I will take a looky loo

AndrewZhaoLuo · February 4, 2022, 5:49am

Attempted fix here: https://github.com/apache/tvm/pull/10162

JoeyChou · February 12, 2022, 3:05am

Thanks @AndrewZhaoLuo for the quick fix. Would love to see this PR get merged ASAP.