[frontend][pytorch] TVM compatibility with Torch 1.12.0

Hello,

Currently TVM uses Torch 1.11.0 (judging from ubuntu_install_onnx.sh script). Are there any known blockers to use Torch 1.12.0?

I was trying to run some examples with Torch 1.12.0. When I run this one: gallery/how_to/deploy_models/deploy_object_detection_pytorch.py I got the error message below. However, this example runs okay with Torch 1.11.0. I will appreciate your help, thanks!

 Traceback (most recent call last): 
   File "deploy_object_detection_pytorch.py", line 122, in <module> 
     mod, params = relay.frontend.from_pytorch(script_module, shape_list) 
   File "/home/ubuntu/tvm/python/tvm/relay/frontend/pytorch.py", line 4542, in from_pytorch 
     outputs = converter.convert_operators(_get_operator_nodes(graph.nodes()), outputs, ret_name) 
   File "/home/ubuntu/tvm/python/tvm/relay/frontend/pytorch.py", line 3916, in convert_operators 
     relay_out = relay_op( 
   File "/home/ubuntu/tvm/python/tvm/relay/frontend/pytorch.py", line 812, in fill_ 
     return self.full_impl(self.infer_shape(data), fill_value, input_types[0]) 
   File "/home/ubuntu/tvm/python/tvm/relay/frontend/pytorch.py", line 679, in full_impl 
     out = _op.full(_expr.const(fill_value, dtype=dtype), size, dtype=dtype) 
   File "/home/ubuntu/tvm/python/tvm/relay/expr.py", line 517, in const 
     raise ValueError("value has to be scalar or NDArray") 
 ValueError: value has to be scalar or NDArray 

There is no blocker, someone has to do the upgrade work. And indeed, the MaskRCNN model used in deploy_object_detection_pytorch.py always brings challenges when we upgrade.

Hi, @masahi Thanks for the answer and help with the PR. Could you please briefly describe how much work is needed to upgrade TVM to Torch version 1.12.0 and what exactly it is? I was running pytorch frontend unit tests (./tests/scripts/task_python_frontend.sh) locally on my machine, and tests were working okay with Torch 1.12.0. Are there any other checks I can do? Maybe I can test it in your CI? Thanks in advance!

All tests under tests/python/frontend/pytorch need to pass, on both CPU and GPU, with the new Torch version.

As you already found by running deploy_object_detection_pytorch.py, MaskRCNN import is probably broken with the new version, so we need to fix that. test_object_detection.py uses the same MaskRCNN model.

I also expect some tests in test_forward.py to be broken.

Hi @masahi. Could you give me a few pointers how to run CI with PyTorch 1.12.0 environment? I would to have a look are there any issues with it. I was running tests locally, and I don’t see any big issues with tests (but I run tests only on CPU). Test for Mask R-CNN passes successfully after the fix, also my last PR has to fix the issues with FX quantization tests.

Is it enough if I change the version of the torch in docker/install/ubuntu_install_onnx.sh file? Or one need to follow an instruction as you write here:

Thank you in advance!

1 Like

The CI update process is a bit complicated. Changing docker/install/ubuntu_install_onnx.sh is not enough but it is the first step. So can you send such PR?

After that, I can test PT 1.12 on our CI in the ci-docker-staging branch. If all tests pass, we can make the actual image update in Jenkinsfile.

1 Like

ok there is only one test failing, and it is coming from ONNX test. See https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/288/pipeline/463 (search for “FAILED”, capital)

tests/python/frontend/onnx/test_forward.py::test_aten failed, because it uses PyTorch to generate the model, and PT 1.12 has some issues with this test. Can you take a look?

Yes, I will have a look. Thanks!