Segmentation fault (core dumped): It crashes when compiling a ONNX model on GPU using the latest version of TVM

Evans · May 16, 2022, 6:36am

TVM throws Segmentation fault (core dumped) at relay.build() when compile a ONNX model in CUDA.

Please notice that:

the script run well when replace ‘cuda’ with ‘llvm’
the script run well using tvm0.8 version (commit-id: ef6e52f191888ee2a5f2221bde3b69391766903f)
the script run well using relay.create_executor()

I am curious about this crash, wich your comments, Thanks!

The reproducible script

import tvm
import tvm.relay as relay
import onnx


model_path = 'lenet5-fashion-mnist_origin.onnx'
batch_size = 1
target = 'cuda'
predict_model = onnx.load(model_path)

input_shape = (batch_size, 28, 28, 1)

shape_dict = {'conv2d_1_input': input_shape}
print("shape_dict", shape_dict)

irmod, params = relay.frontend.from_onnx(predict_model, shape_dict, freeze_params=True)
# irmod = relay.transform.DynamicToStatic()(irmod)
print('TVM/Relay import model successfully!')

# -----------------------Compile the RelayIR--------------------------
graph, lib, params = relay.build_module.build(irmod, target=target, params=params)

The model is available by this link

Evans · May 20, 2022, 2:07pm

I update TVM to the newest version (2022-5-20), This script also crash when using relay.create_executor().

@FrozenGene @comaniac , could you help me check this crash? Thanks!

masahi · May 20, 2022, 7:49pm

It doesn’t reproduce on my environment.

Can you try gdb --args python your_script.py and show the backtrace?

Evans · May 24, 2022, 6:52am

@masahi Thanks for your reply!

The backtrace is follows:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
malloc_consolidate (av=av@entry=0x7ffff7409c40 <main_arena>) at malloc.c:4469
4469	malloc.c: No such file or directory