I have first converted a distilbert model finetuned on question answering model from transformers in to JIT compiled version. And I tried inferencing with that (JIT compiled model .pt format) without TVM, it worked good.
Now to see the speed gain with TVM, I tried
import tvm from tvm import relay import numpy as np import torch import torchvision from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( '/Users/arjun/datasets/distilbert-base-cased-distilled-squad') model = torch.jit.load('/Users/arjun/datasets/distilbert-base-cased-distilled-squad.pt') model.eval() text = "The Apache Software Foundation is an American nonprofit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated on March 25, 1999. The Apache Software Foundation is a decentralized open source community of developers." question = "When was ASF group formed ?" encoding = tokenizer.encode_plus(question, text, return_tensors="pt", truncation='only_second', padding='max_length') print(encoding) input_ids = encoding["input_ids"] attention_mask = encoding["attention_mask"] shape_list = [("input_ids", input_ids.shape), ("attention_mask", attention_mask.shape)] mod, params = relay.frontend.from_pytorch(model, [input_ids, attention_mask]) print(mod, params)
I am getting this error,
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 Traceback (most recent call last): File "/Users/arjun/tvm/tvm_test.py", line 36, in <module> mod, params = relay.frontend.from_pytorch(model, [input_ids, attention_mask]) File "/Users/arjun/environments/venv/lib/python3.7/site-packages/tvm-0.7.dev1-py3.7-macosx-10.9-x86_64.egg/tvm/relay/frontend/pytorch.py", line 2641, in from_pytorch _report_missing_conversion(op_names, convert_map) File "/Users/arjun/environments/venv/lib/python3.7/site-packages/tvm-0.7.dev1-py3.7-macosx-10.9-x86_64.egg/tvm/relay/frontend/pytorch.py", line 2127, in _report_missing_conversion raise NotImplementedError(msg) NotImplementedError: The following operators are not implemented: ['aten::masked_fill_']
So is it that the operations inside the model is not yet implemented in TVM side for optimization ? So only as of now pruned models from transformers can be used ? In that case does the question answering finetune model will work ? Coz though the weights are pruned, the way that the language model has learnt is only by finding out the masking words right, so if that works, there should be a way to work with normal models like,
Correct me if I am wrong. Any help would be appreciated