[Question] How TVM run text generation model like gpt2

masahi · March 23, 2023, 8:26am

It works for me using this script:

from tvm import relay

import torch
from transformers import GPT2LMHeadModel

token_predictor = GPT2LMHeadModel.from_pretrained("gpt2", torchscript=True).eval()

random_tokens = torch.randint(10000, (5,))
traced_token_predictor = torch.jit.trace(token_predictor, random_tokens)

inputs = [("dummy_input_name", (5,))]
mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype="int64")
print(mod)

zhaoyang-star · March 23, 2023, 9:34am

I updated tvm to the latest version and it works. Thanks a lot for your kind help

zhaoyang-star · March 29, 2023, 5:50am

Because gpt2 requires the input size to increase at each step, under the code above and static shape for current tvm (main branch), I can only do inference on a fixed sequence length. How to solvo this problem? I know relax may solve this problem.

masahi · March 29, 2023, 6:55am

It is also possible to import the model with dynamic shape in Relay. But the performance would be extremely poor.

zhaoyang-star · March 29, 2023, 7:53am

It is also possible to import the model with dynamic shape in Relay.

Is there some demo code for this? I want have a try.

Another question is when will the next version be released with Relax?

masahi · March 29, 2023, 8:26am

github.com

apache/tvm/blob/a0edf24c60bad81a6f4a4333fbf2b63255a37882/tests/python/frontend/onnx/test_forward.py#L1588-L1629


def verify_simple_dynamic_model(a_shape, b_shape, target, dev):
    """verify_simple_dynamic_model"""


    def verify_model(model, a_shape, b_shape):
        a_array = np.random.uniform(size=a_shape).astype("float32")
        b_array = np.random.uniform(size=b_shape).astype("float32")
        # matmul
        out_np = np.matmul(a_array, b_array)
        # relu
        out_np[out_np < 0] = 0


        tvm_out = model(a_array, b_array).numpy()
        tvm.testing.assert_allclose(out_np, tvm_out, rtol=1e-5, atol=1e-5)


    mul_node = helper.make_node("MatMul", ["a", "b"], ["out"])
    relu_node = helper.make_node("Relu", ["out"], ["relu"])


    a_array = np.random.uniform(size=a_shape).astype("float32")
    b_array = np.random.uniform(size=b_shape).astype("float32")
    # matmul

This file has been truncated. show original

This is an example for ONNX, and PT frontend doesn’t support dynamic input shape. But it’s not difficult to add such feature.

zhaoyang-star · March 29, 2023, 8:36am

Thanks for your kind help. I will have a try. Another question is when will the next version be released with Relax?

tqchen · April 16, 2023, 7:26pm

checkout Introducing Web-LLM: Running large language model on web

zhaoyang-star · April 19, 2023, 6:07am

Amazing! I will try it on the browser.

pranjalvyas15 · May 8, 2023, 2:13pm

Hi @zhaoyang-star how will we generate the sentence after getting mods and params.Please help using tvm

chenugray · May 15, 2023, 1:54am

when will tvm support dynamic shape and dynamic shape turing with gpu? @tqchen

tqchen · May 15, 2023, 1:55pm

The latest TVM unity already brings first class dyn shape support checkout https://github.com/mlc-ai/mlc-llm/ as an example

chenugray · May 17, 2023, 8:22am

I saw cli-demo

On Windows and Linux, the chatbot application runs on GPU via the Vulkan platform. For Windows and Linux users, please install the latest Vulkan driver. For NVIDIA GPU users, please make sure to install Vulkan driver, as the CUDA driver may not be good.

Does this mean that the CLI-Demo now only supports Vulkan driver? Nvidia GPUs that do not support Vulkan cannot be used?

And has the latest TVM Unity code branch already supported universal GPU dynamic shape and tuning?

And thank you for your response. @tqchen

tqchen · May 17, 2023, 2:02pm

most gpus should come with vulkan support, but of course the same flow works for the CUDA, so you can. Unity does come with first class dynamic shape support. The tuning still needs some work, but as of now we can easily adapt autotuned schedules to optimize for LLM.

You are more than welcome to try it out. If you are looking for a ML compilation flow for LLM this is likely what you are looking for

zqh · July 2, 2023, 2:11pm

when i run this script, i meet the same error as “ValueError: value has to be scalar or NDArray”. But I didn’t know why and how to fix it

Traceback (most recent call last): File “run.py”, line 12, in mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype=“int64”) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 4554, in from_pytorch outputs = converter.convert_operators(_get_operator_nodes(graph.nodes()), outputs, ret_name) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 3928, in convert_operators relay_out = relay_op( File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 769, in full return self.full_impl(data, fill_value, dtype) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 675, in full_impl out = _op.full(_expr.const(fill_value, dtype=dtype), size, dtype=dtype) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/expr.py”, line 517, in const raise ValueError(“value has to be scalar or NDArray”) ValueError: value has to be scalar or NDArray

ShaobinChen-AH · September 17, 2023, 7:40am

I installed tvm (0.14.dev0) on wsl2. I use this script to replace parts of the above code in order to generate the text, but it throws out an error：

Traceback (most recent call last): File “text_generation.py”, line 48, in mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype=“int64”) File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 5021, in from_pytorch outputs = converter.convert_operators(operator_nodes, outputs, ret_name) File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 4274, in convert_operators relay_out = relay_op( File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 2013, in matmul batch_shape[i] = max(batch_shape[i], j) File “/home/username/tvm/python/tvm/tir/expr.py”, line 186, in bool return self.nonzero() File “/home/username/tvm/python/tvm/tir/expr.py”, line 180, in nonzero raise ValueError( ValueError: Cannot use and / or / not operator to Expr, hint: use tvm.tir.all / tvm.tir.any instead

how can I solve this problem? I searched the bug error but did not get useful information. could you give me some suggestions?

yogeesh · October 21, 2023, 7:45am

hi @ShaobinChen-AH , were you able to resolve this?
i got the same error , when i was trying to compile gpt-2 with tvm .

ShaobinChen-AH · October 21, 2023, 8:09am

It seems not a easy problem. what’s your environment and what’s os your installed tvm runs on, Macos or Linux or …?

yogeesh · October 25, 2023, 6:24am

hi @ShaobinChen-AH , I am using linux 18.0 , and below are my version details
tvm version : 0.14.dev0
pytorch version : 2.1.0+cu118
transformers version : 4.34.1
could you please giude me if u were able to solve this error?

prachideshwal · July 15, 2024, 8:12pm

can you please give some ints or explanation how to run LLM Model like gpt-neo on tvm , after converting model from pytorch to onnx