How to bind 'L' dim to blockIdx with matmul

maidabu · January 20, 2021, 8:51am

The Matmul example in the auto tunning, N and M dim can be binding to blockIdx and threadIdx, but L dim can only be binding to threadIdx, which means L dim reduction can only be calculated in block. If I want to calculate L dim reduction with multiple blocks, how can I do?