How to bind thread for GPU in TensorIR

I’m now testing a simple matrix multiplication by using TensorIR. I want to bind the axis to the GPU thread. But there is an error saying that the child block is neither a complete block nor a reduction block. Would you please help me address this issue?

My code is shown below.

@tvm.script.ir_module
class MyModule:
    @T.prim_func
    def main(a: T.handle, b: T.handle, c: T.handle):
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        A = T.match_buffer(a, [128, 128])
        B = T.match_buffer(b, [128, 128])
        C = T.match_buffer(c, [128, 128])
        
        for i, j, k in T.grid(128, 128, 128):
            with T.block("update"):
                vi, vj, vk= T.axis.remap("SSR", [i, j, k])
                C[vi, vj] = C[vi, vj] + A[vi, vk] * B[vj, vk]

ir_module = MyModule
sch = tvm.tir.Schedule(ir_module)
block_b = sch.get_block("update")
(i,j,k) = sch.get_loops(block_b)
sch.bind(i,"threadIdx.x")

Thanks.