Cannot execute for GPU a module made from block builder + LegalizeOps + dlight; for matrix multiplication

Hello! I’m new to TVM and was trying to learn relax and getting it to run, but I am encountering problems.

the Error said : 64 errors detected in the compilation of "/tmp/tmpl_7hk6ox/tvm_kernels.cu".

I was doing matrix multiply with the block builder These were the variables

target = "cuda"
dev = tvm.device(target, 0)

shape = relax.expr.ShapeExpr([ T.int64(), T.int64() ])
x = relax.Var( "x", relax.TensorStructInfo(shape) )
y = relax.Var( "y", relax.TensorStructInfo(shape) )

Then this is the function

with bb.function("matmul", [x, y]):
    lv0 = bb.emit(relax.op.matmul(x, y)) # need thread env for gpu
    bb.emit_func_output(lv0)

mod = bb.get()
mod = relax.transform.LegalizeOps()(mod)

I use the following to convert to apply GPU schedules with dlight

with tvm.target.Target(target):
    mod = dl.ApplyDefaultSchedule(
        dl.gpu.Matmul(),
        dl.gpu.Fallback(),
    )(mod)

Lastly, the following gives me an error

exec = relax.build(mod, target=target)

Am I missing some steps?

Additional information: I use

  • tvm version 0.19.dev0
  • cuda 12.6
  • cuda().exist == True
  • USE_CUDA ON, USE_CUDNN ON, USE_CUBLAS ON, USE_CUTLASS OFF
  • tried the mnist example in CDNN validation test from nvidia installation guide - Pass
  • Ubuntu 22.04
  • I tried the llvm target without the dlight gpu_mod step, and it works fine.

Thank you!

Try substituting,

M, N, K = (
    tir.Var("m", dtype="int64"),
    tir.Var("n", dtype="int64"),
    tir.Var("k", dtype="int64"),
)
x_shape, y_shape = relax.expr.ShapeExpr([M, K]), relax.expr.ShapeExpr([K, N])
x = relax.Var("x", relax.TensorStructInfo(x_shape))
y = relax.Var("y", relax.TensorStructInfo(y_shape))