Hi, I’m new to TVM and I recently find a weird problem.
I build TVM with exactly same version, and almost the same compilation option on macOS and ubuntu 20.04. Both of them can compile the pytorch model correctly. However, the precision of the model output differs.
To put it more simply, the TVM on my laptop(macOS) can pass the tests under pytorch/test_forward.py
, but the TVM on my server(ubuntu) can not.
Only after I loose the tolerance in tvm.testing.assert_allclose
to about 1e-2 can the TVM on ubuntu pass the test like test_forward.test_mnasnet0_5()
Does anyone have any idea on this problem? I will be very grateful if you can help.
System info:
-
macOS version: 10.15.7; python: 3.8.5; LLVM: 10.0.0; Pytorch: 1.8.1
-
Ubuntu version: 20.04; python: 3.8.6; LLVM: 10.0.0; Pytorch: 1.8.1+cu111
-
TVM version: 0.8.dev0, at fbdffeb546b350eedb470f07b7915341610e3367 commit
Here is some additional information:
I found this problem when I want to move the model compilation from my laptop to my server. assert_allclose passed on my laptop but failed on my server.
I tried to do cross compilation, that is, compile linux runtime on my laptop, and test the compiled model on my server, and vice versa. But the precision problem still exists: 1e-2 for Ubuntu, and 1e-4 for macOS.
I doubt this is a runtime problem, maybe even irrelevant to TVM. And I start to think that 1e-2 is actually acceptable. But the discussion in Aspirations for conversion and unit tests in the PyTorch frontend seems to support tighter precision requirement.
Also, tests on Jenkins are all passed. So I have no idea what is going on …
Update: This problem is not restricted to pytorch model. I also tried test_forward_inception_v3
under tensorflow test. The ubuntu failed the test and the macOS passed. It will give something like:
AssertionError: Not equal to tolerance rtol=1e-05, atol=1e-05
Mismatched elements: 14 / 1001 (1.4%)
Max absolute difference: 0.00014865
Max relative difference: 0.00256442
x: array([[5.194177e-05, 2.972199e-05, 3.040651e-04, …, 2.509769e-05, 1.136126e-04, 2.880761e-04]], dtype=float32)
y: array([[5.191198e-05, 2.969970e-05, 3.036689e-04, …, 2.506699e-05, 1.134691e-04, 2.877102e-04]], dtype=float32)