[Question] How TVM run text generation model like gpt2

The latest TVM unity already brings first class dyn shape support checkout https://github.com/mlc-ai/mlc-llm/ as an example

1 Like

I saw cli-demo

On Windows and Linux, the chatbot application runs on GPU via the Vulkan platform. For Windows and Linux users, please install the latest Vulkan driver. For NVIDIA GPU users, please make sure to install Vulkan driver, as the CUDA driver may not be good.

Does this mean that the CLI-Demo now only supports Vulkan driver? Nvidia GPUs that do not support Vulkan cannot be used?

And has the latest TVM Unity code branch already supported universal GPU dynamic shape and tuning?

And thank you for your response. @tqchen

most gpus should come with vulkan support, but of course the same flow works for the CUDA, so you can. Unity does come with first class dynamic shape support. The tuning still needs some work, but as of now we can easily adapt autotuned schedules to optimize for LLM.

You are more than welcome to try it out. If you are looking for a ML compilation flow for LLM this is likely what you are looking for

1 Like

when i run this script, i meet the same error as “ValueError: value has to be scalar or NDArray”. But I didn’t know why and how to fix it

Traceback (most recent call last): File “run.py”, line 12, in mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype=“int64”) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 4554, in from_pytorch outputs = converter.convert_operators(_get_operator_nodes(graph.nodes()), outputs, ret_name) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 3928, in convert_operators relay_out = relay_op( File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 769, in full return self.full_impl(data, fill_value, dtype) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/frontend/pytorch.py”, line 675, in full_impl out = _op.full(_expr.const(fill_value, dtype=dtype), size, dtype=dtype) File “/home/zqh/tvm-v0.10.0/python/tvm/relay/expr.py”, line 517, in const raise ValueError(“value has to be scalar or NDArray”) ValueError: value has to be scalar or NDArray

I installed tvm (0.14.dev0) on wsl2. I use this script to replace parts of the above code in order to generate the text, but it throws out an error:

Traceback (most recent call last): File “text_generation.py”, line 48, in mod, params = relay.frontend.from_pytorch(traced_token_predictor, inputs, default_dtype=“int64”) File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 5021, in from_pytorch outputs = converter.convert_operators(operator_nodes, outputs, ret_name) File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 4274, in convert_operators relay_out = relay_op( File “/home/username/tvm/python/tvm/relay/frontend/pytorch.py”, line 2013, in matmul batch_shape[i] = max(batch_shape[i], j) File “/home/username/tvm/python/tvm/tir/expr.py”, line 186, in bool return self.nonzero() File “/home/username/tvm/python/tvm/tir/expr.py”, line 180, in nonzero raise ValueError( ValueError: Cannot use and / or / not operator to Expr, hint: use tvm.tir.all / tvm.tir.any instead

how can I solve this problem? I searched the bug error but did not get useful information. could you give me some suggestions?

hi @ShaobinChen-AH , were you able to resolve this?
i got the same error , when i was trying to compile gpt-2 with tvm .

It seems not a easy problem. what’s your environment and what’s os your installed tvm runs on, Macos or Linux or …?

hi @ShaobinChen-AH , I am using linux 18.0 , and below are my version details
tvm version : 0.14.dev0
pytorch version : 2.1.0+cu118
transformers version : 4.34.1
could you please giude me if u were able to solve this error?