[Bug]Integrated GPU returned wrong Total GPU memory size for Vulkan

samwyi · July 14, 2024, 6:17am

When running llama2 models on AMD iGPU via Vulkan, meet the following error:

“mlc-llm\cpp\serve\threaded_engine.cc”, line 270 TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 217.600 MB, which is less than the sum of model weight size (3615.133 MB) and temporary buffer size (639.024 MB).

After debugging the issue, I realized that the cause of the issue is that tvm/runtime/vulkan didn’t return the correct value of total GPU memory for integrated GPU (iGPU). The current code in vulkan_device.cc:317 only count DEVICE_LOCAL memory as compute memory, but for iGPU, most GPU memory are not DEVICE_LOCAL. Therefore, we should handle iGPU and dGPU differently when calculating compute_memory_size.

samwyi · July 14, 2024, 6:21am

I can create a PR to fix this issue if no one is working on this. Thanks.

kkkeevvvin · August 26, 2024, 6:27am

Hi, I’m newbie here and have encountered a similar problem. My devices detected by auto_device.py were CUDA and Vulkan with an NVIDIA GPU. In my case, the issue was resolved by installing the nvidia-cuda-toolkit. You might want to give that a try to ensure your environment is set up correctly.