[PyTorch] PyTorch frontend failing to convert nn.Transformer

Hi, I’m attempting to convert and run a simple transformer model in TVM using the PyTorch front-end, but I’m running into an issue within the from_pytorch converter. I’ve tested other models like BERT from the pytorch-transformers package, these work well. Here is a minimal reproducer:

import torch
from tvm import relay

model = torch.nn.Transformer(nhead=8, num_encoder_layers=6, num_decoder_layers=6)
model = model.eval()
src = torch.rand((10, 32, 512))
tgt = torch.rand((20, 32, 512))
input = [src, tgt]

trace = torch.jit.trace(model, input)
input_names = ["input{}".format(idx) for idx, inp in enumerate(input)]
input_shapes = list(zip(input_names, [inp.shape for inp in input]))

mod, params = relay.frontend.from_pytorch(trace, input_shapes)

And the exact issue I’m having:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/luke/incubator-tvm/python/tvm/relay/frontend/pytorch.py", line 3365, in from_pytorch
    default_dtype=default_dtype,
  File "/home/luke/incubator-tvm/python/tvm/relay/frontend/pytorch.py", line 3266, in convert_operators
    inputs, _get_input_types(op_node, outputs, default_dtype=default_dtype)
  File "/home/luke/incubator-tvm/python/tvm/relay/frontend/pytorch.py", line 313, in _impl
    begin[dim] = int(inputs[2])
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Constant'

I’ve attempted to look further into the issue with little success. Any help is appreciated, I’m not very familiar with the PyTorch frontend. Apologies if there is something obvious I’m missing.

cc @masahi, @siju-samuel

@lhutton1 You hit a bug dating back to the very first commit to the pytorch frontend (whose tests coverage is not great).

You can simply modify

to

begin[dim], _ = try_infer_value(inputs[2], lambda ret: np.asscalar(ret.astype(np.int)))

This fixes the conversion problem, but I found there is something wrong with accuracy compared to the PyTorch result. I haven’t looked into the detail yet.

Thanks @masahi that seems to have worked. Personally I’m only interested in benchmarking the model so I’m not too concerned about accuracy - that is as long as the implementation is mostly correct. This obviously isn’t correct though, so if I have time I’ll go back and take another look :slight_smile: