[Bug] [OpenCL] module.benchmark stuck on some platforms

Dear All,

Thanks again for the great work on the project.

I have been using TVM for a while, and recently encountered a strange behavior on the latest version on new HW/SW environment. All of my machines are using Ubuntu 20.04 with the 5.13 kernel, with python 3.8.10.

I have a laptop with an Ivy Bridge CPU and Kepler GPU (GT650m), with the r418 driver. I have a desktop with a skylake CPU and a RTX2080, with the r510 driver.

On the older platform, everything tunes and works fine, either CPU, GPU with CUDA, GPU with OpenCL. On the newer one, CPU and CUDA works fine, but as soon as I use OpenCL, either from my own code (which uses module.module.time_evaluator) or the tune_relay_mobile_gpu.py sample (in which I remove the “-device=mali”), it tunes fine, but gets stuck on module.benchmark. I have encountered the same problem on another recent laptop with an Intel Xe GPU (and no Nvidia GPU or software stack installed). I have tried both with a clean virtualenv (hence latest xgboost, tensorflow, numpy, etc), and with the same one as my old laptop.

Have you encountered this behavior?

masahi said he encountered no problem running models on Tigerlake, but I do not know which configuration he had (python version, llvm version, model used, etc).

Thank you in advance.

Dear all,

I think I have found a solution to this problem. It seems to be linked to the way TVM currently sets the TVM_OPENCL_WARP_SIZE. It is set to 1 by default, and this leads to undefined behavior on some platforms. Setting it to either 16 or 32 solved the issue.

Does problem happen on skylake CPU and a RTX2080, with the r510 driver?

Which GPU is enabled? Some skylakes do not have integrated GPU, some has. If you have integrated GPU you can set up environment to use it and then OpenCL goes on intel graphics. On the other hand it is not highly probable since you have not mentioned Intel drivers.

The warp size affects some schedules, affects performance and should not affect stability. I am aware about some bugs in different OpenCL implementations for different hardware. It is also well known that Nvidia does not support OpenCL well, it’s better to use Cuda for such GPU.

I.e. if you have Nvidia graphics, you need to use Cuda. If you use intel graphics, use target="opencl -device=intel_graphics" + use AutoTVM or Auto Scheduler