Hi, All.
Existing TensorIntrin
support “reduce_init” and “reduce_body” which could cover most cases, which is very good. However, when I was trying to implement a tensor intrinsic like “matmul_with_relu”, current TensorIntrin is not sufficient to describe it.
The TIR I’m looking for is something like:
if (k == K - 1) { # call "matmul_with_relu" kernel, currently this part is MISSING. } else if (k == 0) { # call "matmul_beta_0" kernel, which is exactly what "reduce_init" is doing. } else { # call "matmul_beta_1" kernel, which is exactly what "reduce_update" is doing. }
Do we have plan to support the “reduce_last” attribute for TensorIntrin.