Actually I’ve found an example from 6.1. Part 1 — Machine Learing Compilation 0.0.1 documentation, but got an error with the following code:
database = ms.tune_tir(
mod=ir_module,
target="nvidia/tesla-p100",
max_trials_global=64,
num_trials_per_iter=64,
work_dir="./tune_tmp",
task_name="main"
)
sch = ms.tir_integration.compile_tir(database, ir_module, "nvidia/tesla-p100")
rt_mod = tvm.build(sch.mod, target="nvidia/tesla-p100")
dev = tvm.cuda(0)
evaluator = rt_mod.time_evaluator("main", dev, number=10)
A_np = np.random.uniform(size=(1024, 1024)).astype("float32")
B_np = np.random.uniform(size=(1024, 1024)).astype("float32")
A_nd = tvm.nd.array(A_np, dev)
B_nd = tvm.nd.array(B_np, dev)
C_nd = tvm.nd.array(np.zeros((1024, 1024), dtype="float32"), dev)
print("MetaSchedule: %f GFLOPS" % (num_flop / evaluator(A_nd, B_nd, C_nd).mean / 1e9))
The tvm.build(sch.mod, target="nvidia/tesla-p100")
line is throwing an error:
Did you forget to bind?
Variable `B` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `A` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `C` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "/home/tzhou80/projects/tvm/src/tir/analysis/verify_memory.cc", line 214
RuntimeError: Memory verification failed with the following errors:
PrimFunc([var_A, var_B, var_C]) attrs={"tir.noalias": (bool)1, "global_symbol": "main", "target": cuda -keys=cuda,gpu -arch=sm_60 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32} {
parallel (i0_fused, 0, 128) {
C[i0_fused] = (A[i0_fused] + B[i0_fused])
}
}
It looks like tvm.build
is somehow accessing some data that’s not on the device?