Hi everyone, I’d like to initiate a discussion about AMD Matrix Core. As we know, AMD Matrix Core is some mix-precision matrix compute specific hardware accelerators like NVIDIA Tensor Core, which both of them brings significant speed up in machining leaning tasks, since tvm has already worked with various datatypes and platforms, I think it might be great interesting to explore the Matrix Core Integration since AMD vendor libraries like rocBLAS and composable kernel still has very poor performance on matrix core computation and that’s what I wanna to do for now.
Based on my observations of the TVM ROCm backend and some ROCm programming experience, I think there are two possible routes to choose from:
-
Continue with the current LLVM IR codegen: LLVM IR now supports matrix core. However, I’m curious about why TVM ROCm code generation uses LLVM IR as its backend.
-
Reuse CUDA source code generation for HIP: In my experience, HIP and CUDA codes have a high degree of overlap, as I can simply change
cuda_runtime.h
tohip_runtime.h
and compile the code with hipcc. The source code can then work correctly. For matrix core, I would just need to replace nvcuda::wmma with rocwmma.
Please let me know which route do you prefer ? Thanks.