Aspirations for conversion and unit tests in the PyTorch frontend

Hi,

I wanted to raise something I bumped into recently:

The unit tests for the PyTorch frontend verify the model, but only to an accuracy of 1e-3 (atol and rtol).

I would argue that a faithful transposition of the PyTorch model would want more than that, a more reasonable default could be 1e-5 or so for single precision. Ideally we would also work with double precision and tighten the tolerance to 1e-10 or so, but the PyTorch frontend currently cannot support this (as far as I can tell).

The reason I bumped into this is because the PyTorch frontend currently replace the “true” gelu implementation of PyTorch with a rather crude approximation (sometimes termed fast gelu) that appears to have an error of magnitude 1e-3 error (and just barely passes the unit test verification). While it may be reasonable to replace gelu with fast gelu on platforms where tanh is substantially cheaper than erf (I have doubts that it matters much on typical GPUs), I don’t think the PyTorch frontend is necessarily the ideal place for it. If only because it confuses simple-minded people like me when they try to convert their model through TVM.

What do you think?

  • Should the magnitude for unit tests be tightened?
  • Is it desirable to eventually get doubles? (Even if this means passing in a default dtype until PyTorch type inference gives even better annotations.)
  • Should we expect the PyTorch frontend to be a 1-1 conversion or do we expect further optimizations? (If the former, I have a PR ready to submit to convert gelu instead of fastgelu)

Best regards

Thomas

@masahi @siju-samuel as people that might be particularly attached to fastgelu. :slight_smile:

Yes that would be desirable. The 1e-3 tol has been around since the first PR that added PyTorch frontend, I don’t know why this threshold was chosen or how bad lower tols would break our tests.

I have no experience with TVM + double precision, so I have no idea how well/slow it would work. That said, I don’t have any reason to oppose to adding doubles, so if you could make it work that would be welcome.

As you mentioned, our frontend assumes fp32 by default everywhere and only tested on fp32 and uint8 models. So getting doubles to work might be tricky.

Yes, our frontend is supposed to be as faithful as possible to input torch models. I didn’t know the issue of gelu, it is good to know. If users want faster but less accurate conversion, they can overwrite gelu conversion with custom_convert_map.

Thank you for weighing in here. Implementing a lot of things will have stuff slip through. :slight_smile:

So currently, the typing in the PyTorch frontend is worth improving (I have a branch that I’ll split into a few PRs that follow if/when PR5756 is merged), but the gist of it is that you can, in principle, have doubles.

The significance of that is that numerical verification of “does this do the same” is much better with doubles. To put in my process that led to this thread:

  • When using float, smallest component had ~1e-6 deviation between PyTorch and converted-to-TVM, the larger submodel using it and other things had ~1e-5 and the entire thing (well not quite, but here) had ~1e-3.
  • The gap between 1e-3 and 1e-5 isn’t all that large. Moving to double, I got ~1e-15, ~1e-14 and ~1e-3 for the three respective components. Now I saw clearly there was something up with the entire model that isn’t in the submodel and had found the gelu approximation shortly after.

For things involving larger reductions or operands, the numerical precision part can be even larger (e.g. batch norm is notorious for this).

Similarly, the experience with PyTorch autograd verification is that they invariably work so much better with doubles.

So this is where dropping to doubles seems a clear win for checking that everything is OK.

I’ll look at sending a PR (upate: #5763) with gelu and some tightening and then we can revisit this when I get a chance to send some typing things.

Best regards

Thomas