When running llama2 models on AMD iGPU via Vulkan, meet the following error:
“mlc-llm\cpp\serve\threaded_engine.cc”, line 270 TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 217.600 MB, which is less than the sum of model weight size (3615.133 MB) and temporary buffer size (639.024 MB).
After debugging the issue, I realized that the cause of the issue is that tvm/runtime/vulkan didn’t return the correct value of total GPU memory for integrated GPU (iGPU). The current code in vulkan_device.cc:317 only count DEVICE_LOCAL memory as compute memory, but for iGPU, most GPU memory are not DEVICE_LOCAL. Therefore, we should handle iGPU and dGPU differently when calculating compute_memory_size.