[Discussion] Support for AMD Matrix Core

Hi everyone, I’d like to initiate a discussion about AMD Matrix Core. As we know, AMD Matrix Core is some mix-precision matrix compute specific hardware accelerators like NVIDIA Tensor Core, which both of them brings significant speed up in machining leaning tasks, since tvm has already worked with various datatypes and platforms, I think it might be great interesting to explore the Matrix Core Integration since AMD vendor libraries like rocBLAS and composable kernel still has very poor performance on matrix core computation and that’s what I wanna to do for now.

Based on my observations of the TVM ROCm backend and some ROCm programming experience, I think there are two possible routes to choose from:

  1. Continue with the current LLVM IR codegen: LLVM IR now supports matrix core. However, I’m curious about why TVM ROCm code generation uses LLVM IR as its backend.

  2. Reuse CUDA source code generation for HIP: In my experience, HIP and CUDA codes have a high degree of overlap, as I can simply change cuda_runtime.h to hip_runtime.h and compile the code with hipcc. The source code can then work correctly. For matrix core, I would just need to replace nvcuda::wmma with rocwmma.

Please let me know which route do you prefer ? Thanks.

1 Like

Please feel free to share your comments and suggestions. :smiling_face_with_three_hearts:

I think it is a great idea. LLVM backend should work. If the lift of hip is low, having an alternative hip is not a bad idea since that provides some hack referencing abilities. We also had recent vulkan support, which is another route to cover environments that are not covered by rocm

llvm is a good idea to go. Since amd open source, we could dig it very deeply. Hip is also a good alternative, however from my view, its goal is compatible of cuda as much as possible, if we want to expolore exetreme performance, i think llvm maybe a better way :slight_smile:

2 Likes

@vinx13 and I had some experience working on ROCm backend for AMD GPU. Just like TensorCore, matrix core can be expressed with affine expression which is well supported by TensorIR

2 Likes

agreed, looks like hip code doesn’t have good performance on matrix core.

Feel free to review the changes and provide any feedback or suggestions. unlike tensor core, few mfma intrinsic candidates for a given computation…

2 Likes