CUDA inference simple model segmentation fault

I am running CNN models on CUDA (without cuDNN) on an AGX Xavier, and I am getting a strange error that occurs with dense layers.

I am unable to run ResNet18-CIFAR10 without getting a segmentation fault.

I’ve managed to identify that the issue appears to be related to how the dense layer is interacting with some convolutional layers?

If we look at the PyTorch definition of the model, removing most of the layers, this is the minimal network I can get a crash in:

def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))

    out = F.relu(self.layer1(out))
    
    out = out.view(out.size(0), -1)
    out = self.linear(out)
    return out

If I remove out = self.layer1(out), it works. If I remove out = self.linear(out)` it works.

I’ve got the both the model and the TVM inference in this single script.

I am using CUDA without cuDNN. It does not seem to be related to things like bias or kernel size or padding.

The two conv2d layers seem pretty normal looking:

self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = nn.Conv2d(
    self.in_planes, 64, kernel_size=3, stride=1, padding=1, bias=False
)

Does anyone have any suggestions as to how to identify the root cause here? Again, I have the code here.

This is the case on several versions of TVM, including the latest one (4087e72b657eae484bb647cbd8ef86b9acf11748).

I tried your script on rtx 3070 but couldn’t reproduce the segfault.

Hi there, many thanks.

To clarify, I do not get the segfault when running on desktop/HPC class NVidia GPUs, such as the TITAN V.

The problem seems to be contained to NVidia’s Jetson class of devices - the AGX Xavier, and Jetson Nano.

I guess it could be an error with how some aspect of TVM is interacting with the Jetson version of the NVidia libraries.

Some more debugging info:

When I run the basic_model.py script with gdb, it traces the segmentation fault to:

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000007f5b9fdb44 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so

Running gdb backtrace doesn’t give much more information, as I don’t have debug symbols for the NVidia Jetson libraries.

 backtrace
#0  0x0000007f5b9fdb44 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
#1  0x0000007f5bce655c in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#2  0x0000007f5bc64054 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#3  0x0000007f5bb655b4 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#4  0x0000007f5bb65624 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#5  0x0000007f5bcab30c in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#6  0x0000007f5bcab7d4 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#7  0x0000007f5bb68588 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#8  0x0000007f5bbac254 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#9  0x0000007f5bbae5cc in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#10 0x0000007f5bb3fb34 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#11 0x0000007f5bc3a37c in cuDevicePrimaryCtxRetain () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#12 0x0000007f9d6b3528 in ?? () from /usr/local/cuda/lib64/libcudart.so.10.2

Running the script with the Python faulthandler tool, (i.e. adding the lines import faulthandler; faulthandler.enable()) I can see I can compile the model with relay.build, but get the segmentation fault during m = graph_executor.GraphModule(lib["default"](dev)).

The actual line in Python that crashes is tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 233 in __call__.

An update: It appears it was a bug in one of the NVidia libraries for Jetson.

I reflashed my AGX Xavier from JetPack 4.4 to the latest version JetPack 4.6; and my standalone example worked without a problem. Hooray.

I still need to test it on an updated Jetson Nano, but if it still doesn’t work there, I can highlight the issue to the NVidia developers.

It looks like the Jetson devices are not in the current CI/CD pipeline. CUDA is already being covered, so I guess the value add would be minimal; just identifying bugs in Jetson CUDA libraries.