Hi there, I am trying to launch different sets of launch params (e.g. grid/block size) based on TVM compiled CUDA kernel, however, directly doing it leads to error of results. Is it possible to support this kind of “elastic kernel” implementations? Thanks
Is it possible to extend compiled CUDA kernel to support launch kernels with dynamic Block/Grid size
@junrushao We had a discussion on binding symbolic-extent loops with physical threads before, and there should be no technical blocking items.
I’ll try creating a PR to fix it.