Hi,
I wanted to raise something I bumped into recently:
The unit tests for the PyTorch frontend verify the model, but only to an accuracy of 1e-3 (atol and rtol).
I would argue that a faithful transposition of the PyTorch model would want more than that, a more reasonable default could be 1e-5 or so for single precision. Ideally we would also work with double precision and tighten the tolerance to 1e-10 or so, but the PyTorch frontend currently cannot support this (as far as I can tell).
The reason I bumped into this is because the PyTorch frontend currently replace the “true” gelu implementation of PyTorch with a rather crude approximation (sometimes termed fast gelu) that appears to have an error of magnitude 1e-3 error (and just barely passes the unit test verification). While it may be reasonable to replace gelu with fast gelu on platforms where tanh is substantially cheaper than erf (I have doubts that it matters much on typical GPUs), I don’t think the PyTorch frontend is necessarily the ideal place for it. If only because it confuses simple-minded people like me when they try to convert their model through TVM.
What do you think?
- Should the magnitude for unit tests be tightened?
- Is it desirable to eventually get doubles? (Even if this means passing in a default dtype until PyTorch type inference gives even better annotations.)
- Should we expect the PyTorch frontend to be a 1-1 conversion or do we expect further optimizations? (If the former, I have a PR ready to submit to convert gelu instead of fastgelu)
Best regards
Thomas
@masahi @siju-samuel as people that might be particularly attached to fastgelu.