A = te.placeholder(shape=(128,), dtype="float32", name="A")
B = te.placeholder(shape=(128,), dtype="float32", name="B")
C = te.compute((128,), lambda i: A[i] + B[i], name="C")
func = te.create_prim_func([A, B, C])
func = func.with_attr("global_symbol", "main")
ir_module = IRModule({"main": func})
ir_module.optimize(). # does this exist?
So that I only need to specify the computation itself, and don’t need to know anythin about the schedule at all. I was thinking something equivalent to this conceptually:
@tvm.compile
def vec_add(a, b):
return a + b
And somehow TVM will optimize the function without the user specifying any schedule. Thanks!
The tvm.build(sch.mod, target="nvidia/tesla-p100") line is throwing an error:
Did you forget to bind?
Variable `B` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `A` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `C` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "/home/tzhou80/projects/tvm/src/tir/analysis/verify_memory.cc", line 214
RuntimeError: Memory verification failed with the following errors:
PrimFunc([var_A, var_B, var_C]) attrs={"tir.noalias": (bool)1, "global_symbol": "main", "target": cuda -keys=cuda,gpu -arch=sm_60 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32} {
parallel (i0_fused, 0, 128) {
C[i0_fused] = (A[i0_fused] + B[i0_fused])
}
}
It looks like tvm.build is somehow accessing some data that’s not on the device?
@twmht Hi, if you want to run on CUDA, you would need to do tune_tir with the target being "cuda", and also do compile_tir with the target being "cuda" (or substitute "cuda" with a specific target string in the list here https://github.com/apache/tvm/blob/main/src/target/tag.cc#L126-L378 (e.g., target="nvidia/geforce-rtx-3090-ti").
The code snippet you shared is exclusive for LLVM and is not applicable to CUDA. Hope this can help.
Hi, I am also encounter this error message, do you happen to solved this error?
Thank you.
Did you forget to bind?
Variable `B` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `A` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `C` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `C` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `C` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "/workspace/tvm/src/tir/analysis/verify_memory.cc", line 205
RuntimeError: Memory verification failed with the following errors:
# from tvm.script import tir as T
@T.prim_func
def main(A: T.Buffer((1024, 1024), "float32"), B: T.Buffer((1024, 1024), "float32"), C: T.Buffer((1024, 1024), "float32")):
T.func_attr({"target": T.target({"arch": "sm_86", "host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["cuda", "gpu"], "kind": "cuda", "max_num_threads": 1024, "tag": "", "thread_warp_size": 32}), "tir.noalias": T.bool(True)})
for i, j, k in T.grid(1024, 1024, 1024):
cse_var_2: T.int32 = i * 1024
cse_var_1: T.int32 = cse_var_2 + j
C_1 = T.Buffer((1048576,), data=C.data)
if k == 0:
C_1[cse_var_1] = T.float32(0)
A_1 = T.Buffer((1048576,), data=A.data)
B_1 = T.Buffer((1048576,), data=B.data)
C_1[cse_var_1] = C_1[cse_var_1] + A_1[cse_var_2 + k] * B_1[j * 1024 + k]
The error message I encounter is when doing meta schedule tuning and compiling. Meta schedule tuning needs xgboost to run. So when I trace the code, the xgboost version is the main problem.
I am not sure what is the main problem in your case.