Hi,
I have looked at a few TVM-generated CUDA kernels, and I noticed that you cast every access to threadIdx.x
to int
, for example in this line taken from a matrix multiplication kernel:
matmul[((((((((((int)blockIdx.x) >> 4) * 32768) + ((((int)threadIdx.x) >> 5) * 8192)) + ...] = ...
I am trying to better understand this code, is there any particular reason for this cast?
Many thanks in advance!