Running ONNX model in Relay - CUDA_ERROR_INVALID_VALUE thrown

shingjan · April 11, 2020, 5:46am

I am trying to run an ONNX Alexnet model, built from Keras/TF, on TVM with Relay. Training and Test data is cifar10. The code snippet is as follows:

target = tvm.target.cuda()
input_name = ‘conv2d_11_input’
shape_dict = {input_name: X_test.shape}
mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
with relay.build_config(opt_level=1):
intrp = relay.build_module.create_executor(‘graph’, mod, tvm.gpu(), target)
dtype = ‘float32’
tvm_output = intrp.evaluate()(tvm.nd.array(X_test.astype(dtype)), **params).asnumpy() \

This code runs fine with target set to “llvm” using CPU. But when I try to run it with CUDA. This error is thrown:

'TVMError Traceback (most recent call last)
in
1 # LLVM EXECUTE SUCCEEDED
2 dtype = ‘float32’
----> 3 tvm_output = intrp.evaluate()(tvm.nd.array(X_test.astype(dtype)), **params).asnumpy()

~/tvm/python/tvm/relay/build_module.py in _graph_wrapper(*args, **kwargs)
336 gmodule.set_input(i, arg)
337 # Run the module, and fetch the output.
→ 338 gmodule.run()
339 # make a copy so multiple invocation won’t hurt perf.
340 if num_outputs == 1:

~/tvm/python/tvm/contrib/graph_runtime.py in run(self, **input_dict)
166 if input_dict:
167 self.set_input(**input_dict)
→ 168 self._run()
169
170 def get_num_outputs(self):

~/tvm/python/tvm/_ffi/_ctypes/function.py in call(self, *args)
205 self.handle, values, tcodes, ctypes.c_int(num_args),
206 ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:
→ 207 raise get_last_ffi_error()
208 _ = temp_args
209 _ = args

TVMError: Traceback (most recent call last):
[bt] (3) /home/xxx/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7f35b552b441]
[bt] (2) /home/xxx/tvm/build/libtvm.so(std::Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocatortvm::runtime::detail::ArgConvertCode > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xbc) [0x7f35b559469c] \ [bt] (1) /home/xxx/tvm/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x665) [0x7f35b5594145]
[bt] (0) /home/xxx/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7f35b4d29142]
File “/home/xxx/tvm/src/runtime/cuda/cuda_module.cc”, line 214
File “/home/xxx/tvm/src/runtime/module_util.cc”, line 72
TVMError: Check failed: ret == 0 (-1 vs. 0) : CUDALaunch Error: CUDA_ERROR_INVALID_VALUE
grid=(32,32,640000), block=(1,1,1)
// func_name=fused_nn_conv2d_nn_bias_add_tanh_4_kernel0
// CUDA Source
//…

It seems to me that TVM/Relay has already compiled the model down to CUDA .cu file but that source file isn’t executable for some reason. I couldn’t see the error code from either CUDA or TVM. Any help is appreciated! BTW I am building TVM from source 0.6.0 with LLVM and CUDA set ON. @merrymercy @vinx13

vinx13 · April 18, 2020, 5:12am

Looks like the block size is invalid. The default schedule might be invalid in some cases, you can try using autotvm to find valid schedules

wwwwcu · July 1, 2020, 7:28am

Hi, Have you solve the problem? I might meet the same error when I run DeeplabV3 (TF model). And I do tune the model by AutoTVM. But still not work. Could you give me some advice? Thanks!!

shingjan · July 26, 2022, 6:30pm

@wwwwcu Hi can you provide a better context so I can take a look? Like how is your tvm built and how one can reproduce your error. The stack trace will be very helpful as well.