TVM unroll is different from real code

lv2020 · August 18, 2024, 9:20am

I run the command in Inquiry About Obtaining PTX Assembly Code from TVM - #2 by LeiWang1999 to generate ptx code for C2D operator.

When I use Nsight Compute to check the source code run on GPU, I find that the real code unrolls more loops than I define in meta_schedule.unroll_explicit. In this case, the unroll factor seems useless since the cuda will always try to unroll more loops. Is this correct?