Is it possible to extend compiled CUDA kernel to support launch kernels with dynamic Block/Grid size

Hi there, I am trying to launch different sets of launch params (e.g. grid/block size) based on TVM compiled CUDA kernel, however, directly doing it leads to error of results. Is it possible to support this kind of “elastic kernel” implementations? Thanks

@masahi @junrushao @kparzysz @areusch @BruceDai003 @puddingfjz

@junrushao We had a discussion on binding symbolic-extent loops with physical threads before, and there should be no technical blocking items.

I’ll try creating a PR to fix it.