I don’t think there is a plan to add a GPU schedule for matmul. But there is the batched_matmul schedule for GPU. You can use that from our onnx frontend.
Thanks @masahi , could you guide me towards writing the gpu schedule for matmul as it might be there for cpu ?
Hi @masahi @srkreddy1238 @FrozenGene @tqchen
facing same error mentioned above.
what line did you change so that it worked out for
fuse_reshape_broadcast_mul_conv2d_broadcast_mul_broadcast_add_elemwise_add.
Did you forget to bind?
I’m getting the below error:
nnvm._base.NNVMError: ValueError: Direct host side access to device memory is detected in fuse_matmul_relu. Did you forget to bind?
Did not understand how to use batched_matmul for the same.
Any help here?
Hi @masahi @Hkathuria @yqwang I am facing this same issue for metal GPU.
Error: ValueError: Direct host side access to device memory is detected in addone. Did you forget to bind?
python script is:
import tvm import os
def prepare_test_libs(base_path): n = tvm.var(“n”) A = tvm.placeholder((n,), name=‘A’) B = tvm.compute(A.shape, lambda *i: A(*i) + 1.0, name=‘B’) s = tvm.create_schedule(B.op)
Compile library as dynamic library
fadd_dylib = tvm.build(s, [A, B], "metal", name="addone") dylib_path = os.path.join(base_path, "test_addone_dll.dylib") fadd_dylib.export_library(dylib_path)if name == “main”:
curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__))) prepare_test_libs(os.path.join(curr_path, "./lib"))
I am building it for metal device.
you need to add proper schedule such as binding axis to threads
Hi @vinx13 can you please suggest some sample for this?
see https://docs.tvm.ai/tutorials/tensor_expr_get_started.html#schedule-the-computation
something like
s[C].bind(bx, tvm.thread_axis(“blockIdx.x”))
s[C].bind(tx, tvm.thread_axis(“threadIdx.x”))