Onnx frontend another Linux vs. Windows error

Hi All,

Previously, we had an error in importing onnx file on Windows, and that bug was fixed (See here: Onnx frontend giving an error when working a simple model with TVM). Unfortunately, I am getting another issue. The problem is my code works on Linux and it gives the following error on Windows. I will not be able to share the onnx model or the code. Basically, it is an error at the following line. Windows complains incompatible types, but the linux does work perfectly. Any suggestions @rkimball @mbrookhart ?

mod, params = relay.frontend.from_onnx(onnxmodel, shape_dict)

Here is the errror message.

Incompatible broadcast type TensorType([1, 192, 160, 96], float32) and TensorType([1, ?, (161 - ?), (97 - ?)], float32)
The type inference pass was unable to infer a type for this expression.
This usually occurs when an operator call is under constrained in some way, check other reported errors for hints of what may of happened.
Traceback (most recent call last):
  File "C:\repos\tvm23\tvm\python\tvm\relay\frontend\onnx.py", line 3083, in from_onnx
    mod, params = g.from_onnx(graph, opset, freeze_params)
  File "C:\repos\tvm23\tvm\python\tvm\relay\frontend\onnx.py", line 2899, in from_onnx
    op = self._convert_operator(op_name, inputs, attr, opset)
  File "C:\repos\tvm23\tvm\python\tvm\relay\frontend\onnx.py", line 2998, in _convert_operator
    sym = convert_map[op_name](inputs, attrs, self._params)
  File "C:\repos\tvm23\tvm\python\tvm\relay\frontend\onnx.py", line 452, in _impl_v1
    input_shape = infer_shape(data)
  File "C:\repos\tvm23\tvm\python\tvm\relay\frontend\common.py", line 506, in infer_shape
    out_type = infer_type(inputs, mod=mod)
  File "C:\repos\tvm23\tvm\python\tvm\relay\frontend\common.py", line 487, in infer_type
    new_mod = _transform.InferType()(new_mod)
  File "C:\repos\tvm23\tvm\python\tvm\ir\transform.py", line 127, in __call__
    return _ffi_transform_api.RunPass(self, mod)
  File "C:\repos\tvm23\tvm\python\tvm\_ffi\_ctypes\packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm.error.DiagnosticError: Traceback (most recent call last):
  File "C:\repos\tvm23\tvm\src\ir\diagnostic.cc", line 105
DiagnosticError: one or more error diagnostics were emitted, please check diagnostic render for output.

Process finished with exit code 1

I’m not sure we can debug this without a unit test. TensorType([1, ?, (161 - ?), (97 - ?)] makes me think some shape is getting set with relay.Any() somewhere and then we’re trying to do shape inference, and some computation combines the integer input of one shape with Any. From the stack trace you gave, it seems you’re hitting this on the input of a ConvTranspose op: https://github.com/apache/tvm/blob/e8ab6079920dcac57d7b89582ec6a609d6363dd9/python/tvm/relay/frontend/onnx.py#L453

Based on the shape, where we have a missing channel and a difference for H and W, I’m half tempted to say the input is coming from a Convolution with a static data shape and a dynamic weight.

I don’t know why we’d get an Any on Windows but not Linux.

TVM’s CI doesn’t run tests on Windows, it just does a build. Bob has been cleaning up the tests to work on Windows, I don’t know if he’s seen something like this before. @rkimball have any of your test runs seen unexpected dynamic shapes?

I have been having some success getting pytest running against TVM. I will concentrate on dynamic shapes and onnx test in the hopes that I can find something but we are likely going to need more information in order to debug this.

@jmatai1 What version of onnx and onnxruntime are you using? The TVM CI is setup with onnx==1.6.0 onnxruntime=1.0.0

Ironically this installs easily on windows but I am having real problems getting onnx 1.6.0 on ubuntu. I want to compare some failing tests between windows and linux.

Thanks @rkimball and thanks @mbrookhart. I understand it is difficult to debug without all the inputs and outputs. I have done some debugging myself, and I have not gone far yet.

Hi @rkimball . My Onnx version is 1.8.0

I am using onnx only for importing so I do not have runtime.