Why do you cast `threadIdx.x` to `int`?

Hi,

I have looked at a few TVM-generated CUDA kernels, and I noticed that you cast every access to threadIdx.x to int, for example in this line taken from a matrix multiplication kernel:

matmul[((((((((((int)blockIdx.x) >> 4) * 32768) + ((((int)threadIdx.x) >> 5) * 8192)) + ...] = ...

I am trying to better understand this code, is there any particular reason for this cast?

Many thanks in advance!

there is no fundamental reason to do so (besides making the type really clear), but this isn’t generating actual code in ptx, so it should be fine

1 Like

Thanks for your reply. Two follow-up questions please:

  1. I just checked the return type of threadIdx.x – surprisingly, it is uint (and not int ). When the goal of your explicit cast is “making the type really clear” – do you think a cast to unit would make more sense than to int?
  2. I initially thought your explicit cast might be due to some strange behavior in the CUDA compiler, as outlined here in the NVIDIA’s developer forum: Bizarre type promotion Pretending int is a uint - CUDA Programming and Performance - NVIDIA Developer Forums However, if I got you right, your explicit cast is not(!) due to the reasons described in the developers forum, right?

Many thanks in advance!

1 Like

I just checked the return type of threadIdx.x – surprisingly, it is uint (and not int ). When the goal of your explicit cast is “making the type really clear” – do you think a cast to unit would make more sense than to int?

Yes, after this casting, the dtype is very clear to be int rather than uint

However, if I got you right, your explicit cast is not(!) due to the reasons described in the developers forum

I don’t think the two reasons you presented are fundamentally contradictory to each other. By making it very clear that threadIdx.x is cast to int, the nvcc compiler warning will go away and comparison with negative numbers will be correct, right?