I am trying to run a model consisting of a single conv2d layer on a Jetson TX2. The layer is as follows:
input data shape = (1, 512, 16, 20)
num output channels = 256
kernel size = 5
stride = 1
padding = 2
bias = False
I am cross-compiling and using target="cuda"
and target_host="llvm -target=aarch64-linux-gnu"
. Both the host machine on which I’m compiling and the TX2 are running cuda-8.0
and llvm-4.0
.
tvm._ffi.base.TVMError: Except caught from RPC call: [20:21:10] /home/nvidia/tvm/src/runtime/module_util.cc:52: Check failed: ret == 0 (-1 vs. 0) [20:21:10] /home/nvidia/tvm/src/runtime/cuda/cuda_module.cc:91: CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX
If I reduce num input channels to 256 and num output channels to 128, the error does not appear and I am able to run the layer on the TX2 successfully.
I replicated this problem on my host machine running CUDA, so I do not suspect this to be a TX2 problem. Does anyone have suggestions on how I could further debug this?