How would you do that? I tried the following
bsz = te.var('bsz')
d1 = te.var('d1')
d2 = te.var('d2')
d3 = te.var('d3')
A = te.placeholder((bsz, d1, d3), name='A', dtype='float32') # first tensor
B = te.placeholder((bsz, d2, d3), name='B', dtype='float32') # second tensor
R = topi.nn.batch_matmul(A, B)
with tvm.target.cuda():
s = topi.cuda.schedule_reduce(R)
but got
Traceback (most recent call last):
File "scripts/band_mm_minimal.py", line 94, in <module>
mm_fun_topi = _compile_function_topi()
File "scripts/band_mm_minimal.py", line 63, in _compile_function_topi
s = topi.cuda.schedule_reduce(R)
File "/usr/tvm/topi/python/topi/cuda/reduction.py", line 147, in schedule_reduce
traverse_after_reduce(outs[0].op)
File "/usr/tvm/topi/python/topi/cuda/reduction.py", line 143, in traverse_after_reduce
raise RuntimeError("Unsupported operator: %s" % operator.tag)
RuntimeError: Unsupported operator: batch_matmul
Which schedule should I use?