Relax script compile into cuda got error

Hi,

def f(dtype, d0, d1):
    with IRBuilder() as builder:
        with relax_builder.function():
            R.func_name("main")
            x1 = R.arg("d0", R.Tensor(d0, dtype))
            x2 = R.arg("d1", R.Tensor(d1, dtype))

            with R.dataflow() as frame:
                output = R.emit(x1+x2)
                R.output(output)

            R.func_ret_value(frame.output_vars[0])

    func = builder.get()
    return tvm.IRModule({"main": func})

A=tvm.tir.expr.Var("a", "int64")
B=tvm.tir.expr.Var("b", "int64")
mod = f("float32", (1, 64, A, B), (1, 64, A, B))
mod = relax.transform.LegalizeOps()(mod)
target = tvm.target.Target("cuda")
ex = relax.build(mod, target)

I try below simple relax add script as above, and compile into cuda exectuable, but met error as below. How to make this script works in the right way?

  0: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::IRModule, tvm::transform::PassContext)>::AssignTypedLambda<tvm::tir::transform::VerifyMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1}>(tvm::tir::transform::VerifyMemory()::{lambda(tvm::IRModule, tvm::transform::PassContext)#1})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  Did you forget to bind?
    Variable `B` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
    Variable `A` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
    Variable `T_add` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
  File "/data/aigc/workset/tvm_upstream/src/tir/analysis/verify_memory.cc", line 205
RuntimeError: Memory verification failed with the following errors:
# from tvm.script import tir as T

@T.prim_func
def add(var_A: T.handle, var_B: T.handle, var_T_add: T.handle):
    T.func_attr({"target": T.target({"arch": "sm_86", "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["cuda", "gpu"], "kind": "cuda", "max_num_threads": 1024, "tag": "", "thread_warp_size": 32}), "tir.noalias": T.bool(True)})
    a, b = T.int64(), T.int64()
    A = T.match_buffer(var_A, (T.int64(1), T.int64(64), a, b))
    B = T.match_buffer(var_B, (T.int64(1), T.int64(64), a, b))
    T_add = T.match_buffer(var_T_add, (T.int64(1), T.int64(64), a, b))
    for ax1, ax2, ax3 in T.grid(T.int64(64), a, b):
        T_add_1 = T.Buffer((a * b * T.int64(64),), data=T_add.data)
        A_1 = T.Buffer((a * b * T.int64(64),), data=A.data)
        B_1 = T.Buffer((a * b * T.int64(64),), data=B.data)
        T_add_1[b * (ax1 * a + ax2) + ax3] = A_1[b * (ax1 * a + ax2) + ax3] + B_1[b * (ax1 * a + ax2) + ax3]
free(): invalid pointer
1 Like

there may be your anwser

1 Like