How does TVM decide CUDA kernel launch configurations?

This question has been asked a few times but the answers are not up to date with the most recent version of TVM. I’m trying to understand how TVM decides kernel launch configurations. Eventually, I would like to modify the existing approach and do something. My progress so far:

  1. CUDA kernels is generated in src/target/source/codegen_cuda.cc
  2. Kernels launched in src/runtime/cuda/cuda_module.cc but by this point, the grid and thread block sizes have already been decided
  3. I use DefaultGPUSchedule in my application so I decided to go through src/tir/transforms/default_gpu_schedule.cc and tried to understand what the ThreadBind method does and did see blockIdx and threadIdx being mentioned.

But yes, this is where my progress stopped. I’d really appreciate any pointers to where I could find what I’m looking for. In case this isn’t the only way to do what I want, I would be open to a completely different approach. Thanks in advance!

kernel launch configurations are decided during the schedule time (manual schedule or auto tuning with meta-schedule) via thread binding.