Auto-scheduling NVIDIA GPU raise CUDA_ERROR_NO_BINARY_FOR_GPU

refer to Auto-scheduling a Neural Network for NVIDIA GPU — tvm 0.9.dev0 documentation I run this tutorial on NvGPU 1060 ,and my target device is NvGPU 2080TI, so I define target = tvm.target.Target(“cuda -arch=sm_75”, host=‘llvm’) for target gpu, however NVGPU 1060 sm_arch=61,and run tuning auto-scheduler,raise an error:

No: 191	GFLOPS: 0.00 / 0.00	results: MeasureResult(error_type:RuntimeDeviceError, error_msg:Traceback (most recent call last):
  File "/home/zhouyuangan/.local/lib/python3.8/site-packages/tvm-0.9.dev830+g2fc7d16e9-py3.8-linux-x86_64.egg/tvm/auto_scheduler/measure.py", line 1144, in _rpc_run
    func.entry_func(*loc_args)
  File "tvm/_ffi/_cython/
...
tps://tvm.apache.org/docs/errors.html
---------------------------------------------------------------

  Check failed: ret == 0 (-1 vs. 0) : CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_NO_BINARY_FOR_GPU

, all_cost:4.04, Tstamp:1648606175.53)
==================================================
Placeholder: placeholder, placeholder, placeholder
blockIdx.x b.0@c.0@i.0@j.0@ (0,68)
  threadIdx.x b.2@c.2@i.2@j.2@ (0,288)
    DepthwiseConv2d.local auto_unroll: 1024
    for di.0 (0,5)
      threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,288)
        placeholder.shared = ...
      threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,288)
        vectorize ax0@ax1@ax2@ax3@.1 (0,45)
          PaddedInput.shared = ...
      for dj.1 (0,5)
        for c_c.4 (0,3)
          DepthwiseConv2d.local = ...
    for c.3 (0,3)
      DepthwiseConv2d = ...
blockIdx.x ax0@ax1@ax2@ax3@.0 (0,1836)
  threadIdx.x ax0@ax1@ax2@ax3@.1 (0,32)
    T_multiply = ...

==================================================

what’s mean for Check failed: ret == 0 (-1 vs. 0) : CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_NO_BINARY_FOR_GPU?

I continue to export model lib and run it on NVGPU 2080ti, and it does not raise any error, but the speed is slower than onnxruntime-gpu. what’s happend in this?

Thanks

It seems that we must use correct sm_xy for running on specify gpu device.

2 Likes