This question has been asked a few times but the answers are not up to date with the most recent version of TVM. I’m trying to understand how TVM decides kernel launch configurations. Eventually, I would like to modify the existing approach and do something. My progress so far:
- CUDA kernels is generated in
src/target/source/codegen_cuda.cc
- Kernels launched in
src/runtime/cuda/cuda_module.cc
but by this point, the grid and thread block sizes have already been decided - I use DefaultGPUSchedule in my application so I decided to go through
src/tir/transforms/default_gpu_schedule.cc
and tried to understand what theThreadBind
method does and did see blockIdx and threadIdx being mentioned.
But yes, this is where my progress stopped. I’d really appreciate any pointers to where I could find what I’m looking for. In case this isn’t the only way to do what I want, I would be open to a completely different approach. Thanks in advance!