Hi, All.
Existing TensorIntrin support “reduce_init” and “reduce_body” which could cover most cases, which is very good. However, when I was trying to implement a tensor intrinsic like “matmul_with_relu”, current TensorIntrin is not sufficient to describe it.
The TIR I’m looking for is something like:
if (k == K - 1) {
# call "matmul_with_relu" kernel, currently this part is MISSING.
} else if (k == 0) {
# call "matmul_beta_0" kernel, which is exactly what "reduce_init" is doing.
} else {
# call "matmul_beta_1" kernel, which is exactly what "reduce_update" is doing.
}
Do we have plan to support the “reduce_last” attribute for TensorIntrin.