Tensorization with reduction instrinsic

I declared a tensor intrinsic with only reduction body that sum over the input tensors and output a (1,) tensor. This means that it has only one normal axis, with its min=0, extent=1. When declaring the compute_op, I don’t need to use this axis since it’s a trivial loop.

However some weird errors happened. It said that the computation body mismatched with the intrinsic and the unused axis appeared in the computation body.

I posted the code on https://gist.github.com/vinx13/08e7ea4cc0cca2a24c633a4697929157 and error message is attached below.

I have made some investigation in the TensorIntrinMatcher class and found out some mappings:
in axis_remap_: iter_var(xx.c, Range(min=0, extent=7)) -> iter_var(NOTUSED, Range(min=0, extent=1))
in var_remap_: xx.c -> (NOTUSED + (xx.c + (threadIdx.x*8)))

Also I wonder if it’s correct to declare a tensor intrinsic that reduces to (1,). Thanks.

Traceback:

stmt = schedule.ScheduleOps(sch, bounds)

tvm/src/op/tensorize.cc:347: Check failed: Equal(lhs, rhs) Failed to match the compute with TensorIntrin tensor_intrin’s declaration provided= reduc e(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0.000000f]), source=[(x(rc, rh, (rw + NOTUSED))*y(rc, rh, rw))], axis=[iter_var(rc, Range(min=0, extent=8)), iter_var(rh, Range(min=0, extent=7)), iter_var(rw, Range(min=0, extent=7))], where=(uint1)1, value_index=0), intrin= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0.000000f] ), source=[(x(rc, rh, rw)*y(rc, rh, rw))], axis=[iter_var(rc, Range(min=0, extent=8)), iter_var(rh, Range(min=0, extent=7)), iter_var(rw, Range(min=0, extent=7))], where=(uint1)1, value_index=0)

Replacing the compute op of the tensor intrinsic with
z = tvm.compute((1,), lambda i: tvm.sum(x[rc,rh,rw+i] * y[rc,rh,rw], axis=[rc,rh,rw]))
works, but I’m still confused

original code is z = tvm.compute((1,), lambda NOTUSED: tvm.sum(x[rc,rh,rw] * y[rc,rh,rw], axis=[rc,rh,rw])), why it can not work?