How to use a cpu machine(host machine) and multiple gpu devices to auto tune by RPC Tracker?

chenyihang · February 24, 2021, 7:26am

How to use a cpu machine(host machine) and multiple gpu devices to auto tune by RPC Tracker? Is it necessary to compile TVM with option USE_CUDA ON on CPU machine？

If compiled TVM with option USE_CUDA OFF and run auto-tuning script, the error is

/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/target/target.py:454: UserWarning: tvm.target.create() is being deprecated. Please use tvm.target.Target() instead
  warnings.warn("tvm.target.create() is being deprecated. Please use tvm.target.Target() instead")
Extract tasks...
Get errors with GraphRuntimeCodegen for task extraction. Fallback to VMCompiler. Error details:
Traceback (most recent call last):
  [bt] (8) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::backend::MemoizedExprTranslator<tvm::runtime::Array<tvm::te::Tensor, void> >::VisitExpr(tvm::RelayExpr const&)+0xa9) [0x7fecc74c5219]
  [bt] (7) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x82) [0x7fecc74c4fe2]
  [bt] (6) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>*)#6}::_FUN(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>*)+0x27) [0x7fecc74b80f7]
  [bt] (5) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*)+0x14f) [0x7fecc74bd84f]
  [bt] (4) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::backend::MemoizedExprTranslator<tvm::runtime::Array<tvm::te::Tensor, void> >::VisitExpr(tvm::RelayExpr const&)+0xa9) [0x7fecc74c5219]
  [bt] (3) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x82) [0x7fecc74c4fe2]
  [bt] (2) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>*)#6}::_FUN(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>*)+0x27) [0x7fecc74b80f7]
  [bt] (1) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*)+0x68c) [0x7fecc74bdd8c]
  [bt] (0) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x1896c6b) [0x7fecc7657c6b]
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/relay/backend/compile_engine.py", line 284, in lower_call
    best_impl, outputs = select_implementation(op, call.attrs, inputs, ret_type, target)
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/relay/backend/compile_engine.py", line 186, in select_implementation
    all_impls = get_valid_implementations(op, attrs, inputs, out_type, target)
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/relay/backend/compile_engine.py", line 127, in get_valid_implementations
    strategy = fstrategy(attrs, inputs, out_type, target)
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/target/generic_func.py", line 46, in __call__
    return _ffi_api.GenericFuncCallFunc(self, *args)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 322, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 267, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 160, in tvm._ffi._cy3.core.CALL
  [bt] (3) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7fecc765afc5]
  [bt] (2) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x129cff7) [0x7fecc705dff7]
  [bt] (1) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::GenericFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x1ab) [0x7fecc705dd4b]
  [bt] (0) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x1896c6b) [0x7fecc7657c6b]
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/relay/op/strategy/cuda.py", line 581, in dense_strategy_cuda
    if nvcc.have_tensorcore(tvm.gpu(0).compute_version):
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/_ffi/runtime_ctypes.py", line 237, in compute_version
    return self._GetDeviceAttr(self.device_type, self.device_id, 4)
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/_ffi/runtime_ctypes.py", line 204, in _GetDeviceAttr
    return tvm.runtime._ffi_api.GetDeviceAttr(device_type, device_id, attr_id)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 322, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 257, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 246, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 160, in tvm._ffi._cy3.core.CALL
  [bt] (4) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7fecc765afc5]
  [bt] (3) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x1898615) [0x7fecc7659615]
  [bt] (2) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::DeviceAPIManager::GetAPI(int, bool)+0x144) [0x7fecc765d4e4]
  [bt] (1) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::DeviceAPIManager::GetAPI(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)+0x2ee) [0x7fecc765d2ee]
  [bt] (0) /usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/libtvm.so(+0x1896da2) [0x7fecc7657da2]
  File "/opt/tvm-v0.7.0-cdb00da/src/runtime/c_runtime_api.cc", line 131
TVMError: Check failed: allow_missing: Device API gpu is not enabled.

If compiled TVM with option USE_CUDA ON and run the script python -m tvm.exec.rpc_tracker --host=0.0.0.0 --port=9190, the error is

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/__init__.py", line 25, in <module>
    from ._ffi.base import TVMError, __version__
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/_ffi/__init__.py", line 28, in <module>
    from .base import register_error
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/_ffi/base.py", line 65, in <module>
    _LIB, _LIB_NAME = _load_lib()
  File "/usr/local/lib/python3.6/dist-packages/tvm-0.7.0-py3.6-linux-x86_64.egg/tvm/_ffi/base.py", line 52, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short

chenyihang · February 24, 2021, 9:29am

If I copy libcuda to the cpu machine(host machine), raise CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected error in autotvm.task.extract_from_program() . However, the next tuning is already run correctly，and model compiling also raise CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected error.