The inference result is incorrect with ROCm 5.0.2

Hi everyone. I am using TVM 0.9 on a AMDGPU Ryzen 5950X platform. I encountered a output mismatching issue when I using TVM with ROCm backend whose version is 5.0.2. I was inferring model resnet50, vgg16 obtained from both pytorch and darknet. All the results are different from other backends I tested, such as llvm, nvptx, cuda, etc. I am using the same build of LLVM (version 14). The inference result produced by ROCm is not complete wrong, but has a significant difference between other backends’ results. I validate resnet50 pytorch model with 10000 samples and the top1 score drops to 50% and top 5 drops to 75%. Does anyone know there are any compatible issue on ROCm 5.0.2? By the way, I met a compilation issue when I use current TVM ROCm backend. There are some ROCm bitcode library listed in TVM that is no longer existing in ROCm 5.0.2. Thanks.

@comaniac, @tqchen, Does TVM fit for ROCm 5.0.2 well?

I’ve figured out the reason. RDNA has changed to warp size to 32 but TVM uses the warp size 64 by default. This causes the incorrectness of the inference.