I’m looking at using RPC to cross compile and run on a Jetson TX2.
I first looked at the RPC tutorial and I can cross compile for llvm, and running remotely on a CPU.
But When I followed the GPU tutorial, switching the remote connection to my device, and cuda I keep getting the following error, and it’s not clear what it means.
Check failed: f != nullptr Cannot find function fuse_pad_kernel0 in the imported modules or global registry
I encountered problems when use rpc on my TX2 board, too. It works fine for llvm target but fails for cuda. Here’s the errors:
File "tvm/_ffi/_cython/./function.pxi", line 267, in tvm._ffi._cy3.core.FunctionBase.__call__
File "tvm/_ffi/_cython/./function.pxi", line 216, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./function.pxi", line 208, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 132, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: Except caught from RPC call: TVMCall CFunc Error:
Traceback (most recent call last):
File "/home/nvidia/tvm/python/tvm/_ffi/_ctypes/function.py", line 54, in cfun
rv = local_pyfunc(*pyargs)
File "/home/nvidia/tvm/python/tvm/rpc/server.py", line 47, in load_module
m = _load_module(path)
File "/home/nvidia/tvm/python/tvm/module.py", line 219, in load
_cc.create_shared(path + ".so", files)
File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 33, in create_shared
_linux_shared(output, objects, options, cc)
File "/home/nvidia/tvm/python/tvm/contrib/cc.py", line 58, in _linux_shared
raise RuntimeError(msg)
RuntimeError: Compilation error:
/usr/bin/ld: /tmp/tmp9sycy8oy/lib.o: Relocations in generic ELF (EM: 62)
/usr/bin/ld: /tmp/tmp9sycy8oy/lib.o: Relocations in generic ELF (EM: 62)
/tmp/tmp9sycy8oy/lib.o: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
It is good that we have a generic solution for that. Maybe it makes sense to create a complete RPC deployment tip on how to configure common boards and targets, so we next time we have such question, we can directly give links to the doc
To get better performance I think you should set the cross compilation target correctly.
By default tvm will use this function to compile cuda code to ptx.