Hi,
I’m trying to get some speedup on BERT CPU inference by using TVM. I used this notebook as a guide. To write the following code.
import sys
import os
import torch
import tvm
import tvm.relay
from transformers import BertModel
def prep_model(model, dummy_inputs):
model.eval()
for p in model.parameters():
p.requires_grad_(False)
traced_model = torch.jit.trace(model, dummy_inputs)
traced_model.eval()
for p in traced_model.parameters():
p.requires_grad_(False)
shape_list = [(i.debugName().split('.')[0], i.type().sizes()) for i in list(traced_model.graph.inputs())[1:]]
shape_list
mod_bert, params_bert = tvm.relay.frontend.pytorch.from_pytorch(traced_model,
shape_list, default_dtype="float32")
return mod_bert, params_bert
# this works
bert_model(bert_model.dummy_inputs['input_ids'])
prep_model(bert_model, bert_model.dummy_inputs['input_ids'])
Running from_pytorch
will print the usual WARNING:root:Untyped Tensor found, assume it is float32
multiple times but then stop at some point and just get stuck.
I am running:
- Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-1054-aws x86_64)
- On EC2 instance.
- Python 3.6.9 in a virtual environment.
- I built TVM from source.
Has anyone encountered such a problem, how can I debug this?
Thanks so much in advance!