I run the command in Inquiry About Obtaining PTX Assembly Code from TVM - #2 by LeiWang1999 to generate ptx code for C2D operator.
When I use Nsight Compute to check the source code run on GPU, I find that the real code unrolls more loops than I define in meta_schedule.unroll_explicit. In this case, the unroll factor seems useless since the cuda will always try to unroll more loops. Is this correct?