I declared a tensor intrinsic with only reduction body that sum over the input tensors and output a (1,) tensor. This means that it has only one normal axis, with its min=0, extent=1. When declaring the compute_op, I don’t need to use this axis since it’s a trivial loop.
However some weird errors happened. It said that the computation body mismatched with the intrinsic and the unused axis appeared in the computation body.
I posted the code on https://gist.github.com/vinx13/08e7ea4cc0cca2a24c633a4697929157 and error message is attached below.
I have made some investigation in the TensorIntrinMatcher class and found out some mappings:
in axis_remap_: iter_var(xx.c, Range(min=0, extent=7)) -> iter_var(NOTUSED, Range(min=0, extent=1))
in var_remap_: xx.c -> (NOTUSED + (xx.c + (threadIdx.x*8)))
Also I wonder if it’s correct to declare a tensor intrinsic that reduces to (1,). Thanks.
Traceback:
…
stmt = schedule.ScheduleOps(sch, bounds)
…
tvm/src/op/tensorize.cc:347: Check failed: Equal(lhs, rhs) Failed to match the compute with TensorIntrin tensor_intrin’s declaration provided= reduc e(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0.000000f]), source=[(x(rc, rh, (rw + NOTUSED))*y(rc, rh, rw))], axis=[iter_var(rc, Range(min=0, extent=8)), iter_var(rh, Range(min=0, extent=7)), iter_var(rw, Range(min=0, extent=7))], where=(uint1)1, value_index=0), intrin= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0.000000f] ), source=[(x(rc, rh, rw)*y(rc, rh, rw))], axis=[iter_var(rc, Range(min=0, extent=8)), iter_var(rh, Range(min=0, extent=7)), iter_var(rw, Range(min=0, extent=7))], where=(uint1)1, value_index=0)