How to use Nsight Compute CLI to Analyze

Hi:

I am trying to use Nsight Compute to analyze a schedule for matmul on target cuda.

When I use command like this

/usr/local/NVIDIA-Nsight-Compute/ncu -o profile python matmul.py

to create the profiling file.

It seems that TVM can not find the gpu when ncu is used.

The error logs:

Traceback (most recent call last):
  File "matmul.py", line 117, in <module>
    lab1()
  File "matmul.py", line 102, in lab1
    lab.run_perf_analysis(sch, args)
  File "matmul.py", line 48, in run_perf_analysis
    matmul_func = tvm.build(sch, args, target)
  File "/root/Codes/lsy_tvm/python/tvm/driver/build_module.py", line 422, in build
    mod_host, mdev = _build_for_device(input_mod, tar, target_host)
  File "/root/Codes/lsy_tvm/python/tvm/driver/build_module.py", line 296, in _build_for_device
    mod_dev.functions) != 0 else None
  File "/root/Codes/lsy_tvm/python/tvm/target/codegen.py", line 40, in build_module
    return _ffi_api.Build(mod, target)
  File "/root/Codes/lsy_tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 225, in __call__
    raise get_last_ffi_error()
ValueError: Traceback (most recent call last):
  [bt] (5) /root/Codes/lsy_tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7f0ec1d791a1]
  [bt] (4) /root/Codes/lsy_tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target const&)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::IRModule, tvm::Target const&)>(tvm::runtime::Module (*)(tvm::IRModule, tvm::Target const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x98b) [0x7f0ec178947b]
  [bt] (3) /root/Codes/lsy_tvm/build/libtvm.so(tvm::codegen::Build(tvm::IRModule, tvm::Target const&)+0x661) [0x7f0ec1784fd1]
  [bt] (2) /root/Codes/lsy_tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::IRModule, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>(tvm::runtime::Module (*)(tvm::IRModule, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x2e0) [0x7f0ec17d1440]
  [bt] (1) /root/Codes/lsy_tvm/build/libtvm.so(tvm::codegen::BuildCUDA(tvm::IRModule, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x9b4) [0x7f0ec1cf5d84]
  [bt] (0) /root/Codes/lsy_tvm/build/libtvm.so(+0x18d8b9b) [0x7f0ec1d75b9b]
  File "/root/Codes/lsy_tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
    rv = local_pyfunc(*pyargs)
  File "/root/Codes/lsy_tvm/python/tvm/autotvm/measure/measure_methods.py", line 629, in tvm_callback_cuda_compile
    ptx = nvcc.compile_cuda(code, target=target, arch=AutotvmGlobalScope.current.cuda_target_arch)
  File "/root/Codes/lsy_tvm/python/tvm/contrib/nvcc.py", line 74, in compile_cuda
    raise ValueError("arch(sm_xy) is not passed, and we cannot detect it from env")
ValueError: arch(sm_xy) is not passed, and we cannot detect it from env
==PROF== Disconnected from process 19204
==ERROR== The application returned an error code (1).

Any suggestion? Thanks a lot

Typically, this error means GPU is not detected. You may try “nvidia-smi” from shell to make sure GPU device is up.

I have tried that, and the command nvidia-smi works fine.

If I just use python matmul.py, it will work just fine. And only when I try to use Nsight Compute CLI to analyze the program, this error happens. So I assume that Nsight Compute CLI make TVM not be able to find the GPU