Non top-level reductions in compute statements

Hi all, I have a question concerning the definition of tvm.compute() statements. Can anyone elaborate, why reductions are only allowed at top level of compute?

Simple example:

C = tvm.compute([N,M], lambda i,j: tvm.max(tvm.sum(A[i,k] * B[k, j], axis=k),0), name=‘C’)

This results in the following error:

tvm.ffi.base.TVMError: [09:05:34] […]/tvm/src/op/ Check failed: 0 == level Reductions are only allowed at the top level of compute. Please create another tensor for further composition.

While the option to compose the operation by using a second tensor is available, it is not possible tensorize such a composed workload ( [Tensorize] how to use tensorize for composition op ). Adding information about these limitations to the documentation would be very useful.

1 Like

This is a restriction in the current tensor expression language, because reduction is quite complicated to be processed in nested form.

There are ongoing effort to enhance low-level IR passes to enable more powerful tensorization, which hopefully will resolve the issue you raised.