There are two problems:
-
Please
decompose_reduction
before tensorize. Since your tensor intrinsic only does accumulative computation but no initialization. -
Your tensor intrinsic only support
C += A * B
, but does not supportC += A * B + bias
. To be honest, it’s super strange that you define the computation assum(inp[i, rk] * wght[rk, j] + bias[i, j])
, which means you addbias[i, j]
k
times