Describing Tensorization Intrins in TIR

There are two problems:

  1. Please decompose_reduction before tensorize. Since your tensor intrinsic only does accumulative computation but no initialization.

  2. Your tensor intrinsic only support C += A * B, but does not support C += A * B + bias. To be honest, it’s super strange that you define the computation as sum(inp[i, rk] * wght[rk, j] + bias[i, j]), which means you add bias[i, j] k times