Failures using many of ONNX Model Zoo models

I’ve tried out TVM with several ONNX Model Zoo models, but surprisingly many don’t work:

ok	- https://github.com/onnx/models/raw/master/vision/classification/resnet/model/resnet50-v2-7.tar.gz
ok	- https://github.com/onnx/models/raw/master/vision/classification/mnist/model/mnist-8.tar.gz
ok	- https://github.com/onnx/models/raw/master/vision/classification/rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.tar.gz
ok	- https://github.com/onnx/models/raw/master/vision/classification/efficientnet-lite4/model/efficientnet-lite4-11.tar.gz
not ok	- https://github.com/onnx/models/raw/master/vision/object_detection_segmentation/mask-rcnn/model/MaskRCNN-10.tar.gz
not ok	- https://github.com/onnx/models/raw/master/vision/object_detection_segmentation/yolov4/model/yolov4.tar.gz
not ok	- https://github.com/onnx/models/raw/master/text/machine_comprehension/bert-squad/model/bertsquad-10.tar.gz
not ok	- https://github.com/onnx/models/raw/master/text/machine_comprehension/roberta/model/roberta-base-11.tar.gz
not ok	- https://github.com/onnx/models/raw/master/text/machine_comprehension/gpt-2/model/gpt2-10.tar.gz

I’ve fillled a GitHub issue, so I could attach files:.

I’d appreciate if folks could confirm whether I’m doing something wrong here, or whether these results faithfully reflect the current state of ONNX support in TVM.

Thanks.

I think models other than MaskRCNN should work. Two advices from me:

I tried two changes above in your script but it still doesn’t work, however. cc @mbrookhart @jwfromm

Explicit use of DynamicToStatic should only really be needed if we’re autotuning, and then only in some cases. You should probably freeze the parameters of the onnx model, the TF and pytorch exporters end up storing shape information as weights in the onnx model, freezing the parameters tends to make for a more robust import.

I’ll poke around your script, give me a bit.

This change to the core compilation/execution step of your script:

    try:
        print(f'Importing graph from ONNX to TVM Relay IR ...')
        mod, params = relay.frontend.from_onnx(onnx_model, shape_dict, freeze_params=True)
        print(f'Compiling graph from Relay IR to {target} ...')
        ex = relay.create_executor("vm", mod=mod, device=ctx, target=target)
        print(f"Running inference...")
        output = ex.evaluate()(*input_values, **params)

Gets YoloV4, Bertsquad, and GPT-2 running.

We don’t do very well with this MaskRCNN model, it has dynamically shaped convolutions in it that TVM doesn’t handle very well yet.

Roberta looks like an input datatype problem, I’ll see if I can fix quickly.

I can confirm that freeze_params=True did the trick and made yolo4 and bert-squad working. GPT-2 seems to work with or without freeze_params.

My script with modification: onnx_zoo_test.py · GitHub

Sorry for the delay, my afternoon was packed with meetings. The issue with Roberta is that somehow the int64 input tensor is getting loaded as float64.

If I make this hacky change to to the way you’re importing tensors, it works:

            tensor = onnx.TensorProto()
            input_data = open(input_data, 'rb').read()
            tensor.ParseFromString(input_data)
            x = numpy_helper.to_array(tensor)
            if "roberta" in url:
                x = x.astype("int64")
            input_values.append(x)
            shape_dict[input.name] = x.shape

I’m not sure why ONNX’s numpy_helper isn’t getting that datatype right, the values are definitely Integer, just cast to float64.

Anyway, of this list, that just leaves MaskRCNN, which we know is a limitation. I think @ziheng has been working on better dynamic kernel generation for TVM, but I don’t know the current status.

I also have problems running SSD-MobileNetV1.

It looks like Conv2d cannot handle the dynamic shape problem

  File "/home/chlu/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/chlu/tvm/python/tvm/relay/op/strategy/generic.py", line 240, in _compute_conv2d
    return [topi_compute(*args)]
  File "/home/chlu/tvm/python/tvm/topi/x86/conv2d.py", line 129, in conv2d_nchw
    packed_out = conv2d_NCHWc(data, kernel, strides, padding, dilation, layout, layout, out_dtype)
  File "/home/chlu/tvm/python/tvm/autotvm/task/topi_integration.py", line 165, in wrapper
    node = topi_compute(cfg, *args)
  File "/home/chlu/tvm/python/tvm/topi/x86/conv2d.py", line 191, in conv2d_NCHWc
    oh = (ih - kernel_height + pt + pb) // sh + 1
TypeError: unsupported operand type(s) for -: 'Any' and 'int'

@masahi @mbrookhart , thank you for your prompt replies. Indeed passing freeze_params=True allowed me to get all models except Mask-RCNN working.

Roberta worked fine here without further changes. I’m using onnx==1.8.1. Maybe the onnx version you’re using got a broken numpy_helper.to_array implementation.

Regarding the different ways of executing graphs, I was following apps/benchmark/gpu_imagenet_bench.py as a blueprint, since it gave the best performance when I last tried. Where can I learn more about this VM compiler?

AFAICT, tests/python/frontend/onnx/test_forward.py are unit tests, and tests/python/contrib/test_bnns/test_onnx_topologies.py only covers a sliver of the ONNX models, doesn’t test the expected outputs (and couldn’t get to work without errors.) Are the ONNX Model Zoo models part of TVM regressions tests in some other way?

SSD-MobileNetV1 worked fine with my script after modifying it as:

--- tvm_onnx_model_zoo.py.orig  2021-06-18 15:11:25.140135692 +0000
+++ tvm_onnx_model_zoo.py       2021-06-18 15:12:00.945439030 +0000
@@ -46,6 +46,7 @@
     'https://github.com/onnx/models/raw/master/vision/classification/efficientnet-lite4/model/efficientnet-lite4-11.tar.gz',
     'https://github.com/onnx/models/raw/master/vision/object_detection_segmentation/mask-rcnn/model/MaskRCNN-10.tar.gz',
     'https://github.com/onnx/models/raw/master/vision/object_detection_segmentation/yolov4/model/yolov4.tar.gz',
+    'https://github.com/onnx/models/raw/master/vision/object_detection_segmentation/ssd-mobilenetv1/model/ssd_mobilenet_v1_10.tar.gz',
     'https://github.com/onnx/models/raw/master/text/machine_comprehension/bert-squad/model/bertsquad-10.tar.gz',
     'https://github.com/onnx/models/raw/master/text/machine_comprehension/roberta/model/roberta-base-11.tar.gz',
     # XXX: Often segfaults
@@ -109,7 +110,7 @@
 
     try:
         print(f'Importing graph from ONNX to TVM Relay IR ...')
-        mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+        mod, params = relay.frontend.from_onnx(onnx_model, shape_dict, freeze_params=True)
 
         print(f'Compiling graph from Relay IR to {target} ...')
         with tvm.transform.PassContext(opt_level=1):

@jfonseca Thank you for your test and reply but I forgot to say that I use vm and the target is llvm.

the script like @masahi used.

1 Like