I’m now testing a simple matrix multiplication by using TensorIR. I want to bind the axis to the GPU thread. But there is an error saying that the child block is neither a complete block nor a reduction block. Would you please help me address this issue?
My code is shown below.
@tvm.script.ir_module
class MyModule:
@T.prim_func
def main(a: T.handle, b: T.handle, c: T.handle):
T.func_attr({"global_symbol": "main", "tir.noalias": True})
A = T.match_buffer(a, [128, 128])
B = T.match_buffer(b, [128, 128])
C = T.match_buffer(c, [128, 128])
for i, j, k in T.grid(128, 128, 128):
with T.block("update"):
vi, vj, vk= T.axis.remap("SSR", [i, j, k])
C[vi, vj] = C[vi, vj] + A[vi, vk] * B[vj, vk]
ir_module = MyModule
sch = tvm.tir.Schedule(ir_module)
block_b = sch.get_block("update")
(i,j,k) = sch.get_loops(block_b)
sch.bind(i,"threadIdx.x")
Thanks.