'sm_72' is not a recognized processor for this target (ignoring processor)

Wheest · February 2, 2021, 12:50pm

I’m trying to use the CUDA backend on a NVidia AGX Xavier device.

I successfully built the library with CUDA enabled, however running a model fails with Segmentation fault (core dumped).

Investigating further, by running the test tests/python/topi/python/test_topi_conv2d_nchw.py, I fail with the error:

'sm_72' is not a recognized processor for this target (ignoring processor)
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(TVMFuncCall+0x70) [0x7f72616f00]
  [bt] (2) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<4, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0xe8) [0x7f726b0020]
  [bt] (1) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x658) [0x7f726afee8]
  [bt] (0) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x10ecfb4) [0x7f726acfb4]
  File "/home/user/tools/tvm/src/runtime/cuda/cuda_module.cc", line 105
  File "/home/user/tools/tvm/src/runtime/library_module.cc", line 78
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

I tried forcing to use an older version (sm_50), though this forum post confirms that sm_72 is correct.

from tvm.autotvm.measure.measure_methods import set_cuda_target_arch
set_cuda_target_arch('sm_50')

Is the CUDA library for the Xavier different enough from the normal x86 GPU drivers to be causing these issues?

junrushao · February 2, 2021, 1:05pm

I am working on the same board and using sm_72 as the CUDA target arch, but everything seems fine on my case.

More details on what I am configuring right now:

using llvm -mcpu=carmel -mtriple=aarch64-linux-gnu as target_host
compile on the host machine, and send the artifact to TVM RPC server running on Xavier

Wheest · February 2, 2021, 1:36pm

I’m running locally on the Xavier, without RPC.

For target host I was running llvm -mtriple=aarch64-linux-gnu, I added the -mcpu=carmel flag but raised the error 'carmel' is not a recognized processor for this target (ignoring processor).

Other than enabling CUDA in config.cmake, were there any other things you did?

comaniac · February 2, 2021, 5:57pm

I’m also using Xavier without RPC and everything works fine. My target is simply cuda with the default target host (btw, your target host is also fine. No need to add mcpu).

The process of building TVM should be the same as other platforms. I didn’t do anything special on Xavier.

The test case you tried might not be the right candidate. PTX error is different from segmentation fault and it could be many reasons. For example, it’s possible that the default schedule doesn’t fit the Xavier GPU in terms of the thread number and memory blocks. It would be more useful to run tuning on a workload (e.g., Tuning High Performance Convolution on NVIDIA GPUs — tvm 0.8.dev0 documentation). If the tuner can find a valid schedule, then the setting should be fine.

Wheest · February 2, 2021, 6:52pm

Running the tune_conv2d_cuda.py script appeared to run, returning:

Finish loading 40 records
Time cost of this operator: 0.004278

However, jumping back in the output showed that there was a series of errors:

No: 1   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  [bt] (4) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(TVMFuncCall+0x70) [0x7faaa6bf00]\n  [bt] (3) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x5fea80) [0x7faa013a80]\n  [bt] (2) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tran
sform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2b0) [0x7faa012970]\n  [bt] (1) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tir::tr
ansform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x9c4) [0x7faa26e214]\n  [bt] (0) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x1053f
b0) [0x7faaa68fb0]\n  File "/home/user/tools/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/home/user/tools/tvm/python/tvm/autotvm/mea
sure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=1, all_cost=0.1579885482788086, timestamp=1612291653.9391131)       [('tile_f', [-1, 8, 2, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', 
[-1, 8, 8]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,5915333                                                              
No: 2   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  [bt] (4) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(TVMFuncCa
ll+0x70) [0x7faaa6bf00]\n  [bt] (3) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x5fea80) [0x7faa013a80]\n  [bt] (2) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tran
sform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2b0) [0x7faa012970]\n  [bt] (1) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tir::tr
ansform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x9c4) [0x7faa26e214]\n  [bt] (0) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x1053f
b0) [0x7faaa68fb0]\n  File "/home/user/tools/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/home/user/tools/tvm/python/tvm/autotvm/mea
sure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invali
d gpu kernel',),), error_no=1, all_cost=0.1726396083831787, timestamp=1612291653.9654841)       [('tile_f', [-1, 4, 2, 2]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [
-1, 8, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,8497126
No: 3   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  [bt] (4) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(TVMFuncCa
ll+0x70) [0x7faaa6bf00]\n  [bt] (3) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x5fea80) [0x7faa013a80]\n  [bt] (2) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tran
sform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2b0) [0x7faa012970]\n  [bt] (1) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tir::tr
ansform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x9c4) [0x7faa26e214]\n  [bt] (0) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x1053f
b0) [0x7faaa68fb0]\n  File "/home/user/tools/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/home/user/tools/tvm/python/tvm/autotvm/mea
sure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invali
d gpu kernel',),), error_no=1, all_cost=0.1727280616760254, timestamp=1612291653.9658356)       [('tile_f', [-1, 4, 8, 8]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [
-1, 1, 512]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10257876
No: 4   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  [bt] (4) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(TVMFuncCa
ll+0x70) [0x7faaa6bf00]\n  [bt] (3) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x5fea80) [0x7faa013a80]\n  [bt] (2) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tran
sform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2b0) [0x7faa012970]\n  [bt] (1) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tir::tr
ansform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x9c4) [0x7faa26e214]\n  [bt] (0) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x1053f
b0) [0x7faaa68fb0]\n  File "/home/user/tools/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/home/user/tools/tvm/python/tvm/autotvm/mea
sure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invali
d gpu kernel',),), error_no=1, all_cost=0.1717667579650879, timestamp=1612291653.9876938)       [('tile_f', [-1, 2, 1, 2]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [
-1, 64, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7447936
No: 5   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  [bt] (4) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(TVMFuncCa
ll+0x70) [0x7faaa6bf00]\n  [bt] (3) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x5fea80) [0x7faa013a80]\n  [bt] (2) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tran
sform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2b0) [0x7faa012970]\n  [bt] (1) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tir::tr
ansform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x9c4) [0x7faa26e214]\n  [bt] (0) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x1053f
b0) [0x7faaa68fb0]\n  File "/home/user/tools/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/home/user/tools/tvm/python/tvm/autotvm/mea
sure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invali
d gpu kernel',),), error_no=1, all_cost=0.14327168464660645, timestamp=1612291654.3898673)      [('tile_f', [-1, 64, 8, 1]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc',
[-1, 256, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2161093
No: 6   GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  [bt] (4) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(TVMFuncCa
ll+0x70) [0x7faaa6bf00]\n  [bt] (3) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x5fea80) [0x7faa013a80]\n  [bt] (2) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tran
sform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x2b0) [0x7faa012970]\n  [bt] (1) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(tvm::tir::tr
ansform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x9c4) [0x7faa26e214]\n  [bt] (0) /home/user/tools/incubator-tvm-wheest/build/libtvm.so(+0x1053f
b0) [0x7faaa68fb0]\n  File "/home/user/tools/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/home/user/tools/tvm/python/tvm/autotvm/mea
sure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invali
d gpu kernel',),), error_no=1, all_cost=0.14082765579223633, timestamp=1612291654.3903658)      [('tile_f', [-1, 64, 2, 4]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc',
[-1, 4, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,462994
....
Best config:
[('tile_f', [-1, 4, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 2]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,5475910
Finish loading 40 records
Time cost of this operator: 0.004278

I am able to use this GPU with TensorRT, and didn’t have any issues building TVM with CUDA.

I get a Segmentation fault (core dumped) with no other information for my full ONNX model.

comaniac · February 2, 2021, 7:01pm

It will have some errors for sure. The one you posted is reasonable, and this is the purpose of running a tuning job – check if there’s any “valid” schedule. Based on your log, it seems the tuned conv2d can have the cost 0.004278, which means the environment is good.

To debug the segmentation fault, you could may be use gdb to catch the fault point. Something like

gdb python3
>> run your_script.py

Wheest · February 2, 2021, 9:12pm

Ah right, thanks I didn’t realise exactly what the test was for.

Good to know that side of things is working.

Running gdb, I get the following information:

[Thread 0x7fa4c6b170 (LWP 7872) exited]
[Thread 0x7fa546c170 (LWP 7871) exited]
[Thread 0x7fa4186170 (LWP 7873) exited]
[New Thread 0x7fa4186170 (LWP 7905)]

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000007fa71d2b7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so

Running tegrastats during this run doesn’t yield any additional information. Running my script with cuda-memcheck yields:

========= Error: process didn't terminate successfully
=========        The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= No CUDA-MEMCHECK results found

Searching with the term “segmentation fault /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so” led me to this NVida forums comment, but unfortunately it didn’t help.

comaniac · February 2, 2021, 9:16pm

Hmm maybe you can try the up command in gdb to see if you can locate to the part inside the TVM.

Wheest · February 2, 2021, 9:46pm

Running up, I hit bedrock at /usr/local/cuda-10.2/lib64/libcudart.so.10.2.

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000007fa71d2b7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
(gdb) up
#1  0x0000007fa74d16dc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
(gdb) up
#2  0x0000007fa744f4dc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
(gdb) up
#3  0x0000007fa7350b0c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
(gdb) up
#4  0x0000007fa7350b7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
(gdb) up
#5  0x0000007fa74967ec in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
(gdb) up 20
#12 0x0000007fa81c2528 in ?? () from /usr/local/cuda-10.2/lib64/libcudart.so.10.2
(gdb) up 1000
#12 0x0000007fa81c2528 in ?? () from /usr/local/cuda-10.2/lib64/libcudart.so.10.2
...

Running backtrace I get this information:

(gdb) backtrace
#0  0x0000007fa71d2b7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
#1  0x0000007fa74d16dc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#2  0x0000007fa744f4dc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#3  0x0000007fa7350b0c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#4  0x0000007fa7350b7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#5  0x0000007fa74967ec in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#6  0x0000007fa7496cb4 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#7  0x0000007fa7353ae0 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8  0x0000007fa7397814 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#9  0x0000007fa7399b8c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#10 0x0000007fa732a994 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#11 0x0000007fa7425804 in cuDevicePrimaryCtxRetain () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#12 0x0000007fa81c2528 in ?? () from /usr/local/cuda-10.2/lib64/libcudart.so.10.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

If I saw any calling of libtvm, then I would see if I could compile with debug symbols, but this all seems to be within cuda libraries.

Haven’t experienced any issues with other CUDA-using deep learning software, though I could be hitting some unexpected issue. I flashed this Xavier a couple of months ago with Jetpack 4.4.1.

junrushao · February 4, 2021, 5:57am

Hmmm the backtrace didn’t really provide any meaningful info…I really didn’t have a clue in this case…BTW, would you mind using RPC-based runner instead? I am using it personally and everything seems smooth

sjlee97 · September 30, 2021, 8:36am

Thanks for advices, but is there any update on this issue? I’m suffering from the same segfault error which @Wheest have mentioned, when using NVIDIA Jetson Xavier AGX board in local.

A weird thing is that, as already mentioned by @junrushao, compiling on the host machine (x86_64-linux-gnu with the same CUDA version in my case) and running on the AGX board with TVM RPC server works well as expected.

In addition, Xavier NX board has no issue when running locally in the same way. Can ‘sm_72’ architecture of AGX be the cause of the issue? Or do I have to tune the model for AGX?

Thanks in advance