The inference result is incorrect with ROCm 5.0.2

elfswallow · August 19, 2022, 12:54am

Hi everyone. I am using TVM 0.9 on a AMDGPU Ryzen 5950X platform. I encountered a output mismatching issue when I using TVM with ROCm backend whose version is 5.0.2. I was inferring model resnet50, vgg16 obtained from both pytorch and darknet. All the results are different from other backends I tested, such as llvm, nvptx, cuda, etc. I am using the same build of LLVM (version 14). The inference result produced by ROCm is not complete wrong, but has a significant difference between other backends’ results. I validate resnet50 pytorch model with 10000 samples and the top1 score drops to 50% and top 5 drops to 75%. Does anyone know there are any compatible issue on ROCm 5.0.2? By the way, I met a compilation issue when I use current TVM ROCm backend. There are some ROCm bitcode library listed in TVM that is no longer existing in ROCm 5.0.2. Thanks.

zxs185996 · August 19, 2022, 2:53am

@comaniac, @tqchen, Does TVM fit for ROCm 5.0.2 well？

elfswallow · August 26, 2022, 3:46pm

I’ve figured out the reason. RDNA has changed to warp size to 32 but TVM uses the warp size 64 by default. This causes the incorrectness of the inference.