refer to Auto-scheduling a Neural Network for NVIDIA GPU — tvm 0.9.dev0 documentation I run this tutorial on NvGPU 1060 ,and my target device is NvGPU 2080TI, so I define target = tvm.target.Target(“cuda -arch=sm_75”, host=‘llvm’) for target gpu, however NVGPU 1060 sm_arch=61,and run tuning auto-scheduler,raise an error:
No: 191 GFLOPS: 0.00 / 0.00 results: MeasureResult(error_type:RuntimeDeviceError, error_msg:Traceback (most recent call last):
File "/home/zhouyuangan/.local/lib/python3.8/site-packages/tvm-0.9.dev830+g2fc7d16e9-py3.8-linux-x86_64.egg/tvm/auto_scheduler/measure.py", line 1144, in _rpc_run
func.entry_func(*loc_args)
File "tvm/_ffi/_cython/
...
tps://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: ret == 0 (-1 vs. 0) : CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_NO_BINARY_FOR_GPU
, all_cost:4.04, Tstamp:1648606175.53)
==================================================
Placeholder: placeholder, placeholder, placeholder
blockIdx.x b.0@c.0@i.0@j.0@ (0,68)
threadIdx.x b.2@c.2@i.2@j.2@ (0,288)
DepthwiseConv2d.local auto_unroll: 1024
for di.0 (0,5)
threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,288)
placeholder.shared = ...
threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,288)
vectorize ax0@ax1@ax2@ax3@.1 (0,45)
PaddedInput.shared = ...
for dj.1 (0,5)
for c_c.4 (0,3)
DepthwiseConv2d.local = ...
for c.3 (0,3)
DepthwiseConv2d = ...
blockIdx.x ax0@ax1@ax2@ax3@.0 (0,1836)
threadIdx.x ax0@ax1@ax2@ax3@.1 (0,32)
T_multiply = ...
==================================================
what’s mean for Check failed: ret == 0 (-1 vs. 0) : CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_NO_BINARY_FOR_GPU
?
I continue to export model lib and run it on NVGPU 2080ti, and it does not raise any error, but the speed is slower than onnxruntime-gpu. what’s happend in this?
Thanks