[Question] How TVM run text generation model like gpt2

It works for me using this script:

from tvm import relay

import torch
from transformers import GPT2LMHeadModel

token_predictor = GPT2LMHeadModel.from_pretrained("gpt2", torchscript=True).eval()

random_tokens = torch.randint(10000, (5,))
traced_token_predictor = torch.jit.trace(token_predictor, random_tokens)

inputs = [("dummy_input_name", (5,))]
mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype="int64")
print(mod)
1 Like

I updated tvm to the latest version and it works. Thanks a lot for your kind help :slight_smile:

Because gpt2 requires the input size to increase at each step, under the code above and static shape for current tvm (main branch), I can only do inference on a fixed sequence length. How to solvo this problem? I know relax may solve this problem.

It is also possible to import the model with dynamic shape in Relay. But the performance would be extremely poor.

It is also possible to import the model with dynamic shape in Relay.

Is there some demo code for this? I want have a try.

Another question is when will the next version be released with Relax?

This is an example for ONNX, and PT frontend doesn’t support dynamic input shape. But it’s not difficult to add such feature.

Thanks for your kind help. I will have a try. Another question is when will the next version be released with Relax?

checkout Introducing Web-LLM: Running large language model on web

Amazing! I will try it on the browser.

Hi @zhaoyang-star how will we generate the sentence after getting mods and params.Please help using tvm

when will tvm support dynamic shape and dynamic shape turing with gpu? @tqchen

The latest TVM unity already brings first class dyn shape support checkout https://github.com/mlc-ai/mlc-llm/ as an example

1 Like

I saw cli-demo

On Windows and Linux, the chatbot application runs on GPU via the Vulkan platform. For Windows and Linux users, please install the latest Vulkan driver. For NVIDIA GPU users, please make sure to install Vulkan driver, as the CUDA driver may not be good.

Does this mean that the CLI-Demo now only supports Vulkan driver? Nvidia GPUs that do not support Vulkan cannot be used?

And has the latest TVM Unity code branch already supported universal GPU dynamic shape and tuning?

And thank you for your response. @tqchen

most gpus should come with vulkan support, but of course the same flow works for the CUDA, so you can. Unity does come with first class dynamic shape support. The tuning still needs some work, but as of now we can easily adapt autotuned schedules to optimize for LLM.

You are more than welcome to try it out. If you are looking for a ML compilation flow for LLM this is likely what you are looking for

1 Like

when i run this script, i meet the same error as “ValueError: value has to be scalar or NDArray”. But I didn’t know why and how to fix it

Traceback (most recent call last): File “run.py”, line 12, in mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype=“int64”) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 4554, in from_pytorch outputs = converter.convert_operators(_get_operator_nodes(graph.nodes()), outputs, ret_name) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 3928, in convert_operators relay_out = relay_op( File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 769, in full return self.full_impl(data, fill_value, dtype) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 675, in full_impl out = _op.full(_expr.const(fill_value, dtype=dtype), size, dtype=dtype) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/expr.py”, line 517, in const raise ValueError(“value has to be scalar or NDArray”) ValueError: value has to be scalar or NDArray

I installed tvm (0.14.dev0) on wsl2. I use this script to replace parts of the above code in order to generate the text, but it throws out an error:

Traceback (most recent call last): File “text_generation.py”, line 48, in mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype=“int64”) File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 5021, in from_pytorch outputs = converter.convert_operators(operator_nodes, outputs, ret_name) File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 4274, in convert_operators relay_out = relay_op( File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 2013, in matmul batch_shape[i] = max(batch_shape[i], j) File “/home/username/tvm/python/tvm/tir/expr.py”, line 186, in bool return self.nonzero() File “/home/username/tvm/python/tvm/tir/expr.py”, line 180, in nonzero raise ValueError( ValueError: Cannot use and / or / not operator to Expr, hint: use tvm.tir.all / tvm.tir.any instead

how can I solve this problem? I searched the bug error but did not get useful information. could you give me some suggestions?

hi @ShaobinChen-AH , were you able to resolve this?
i got the same error , when i was trying to compile gpt-2 with tvm .

It seems not a easy problem. what’s your environment and what’s os your installed tvm runs on, Macos or Linux or …?

hi @ShaobinChen-AH , I am using linux 18.0 , and below are my version details
tvm version : 0.14.dev0
pytorch version : 2.1.0+cu118
transformers version : 4.34.1
could you please giude me if u were able to solve this error?

can you please give some ints or explanation how to run LLM Model like gpt-neo on tvm , after converting model from pytorch to onnx