Must reserve the lib module when load the compiled TVM library on Nano GPU

Hi, I just found a very strange bug when loading the compiled TVM library file. Here’s the demo code:

import sys

import numpy as np
import onnx
import tvm
import tvm.relay as relay
from tvm.contrib.download import download_testdata
from tvm.contrib.graph_runtime import GraphModule


def save_and_load(lib, device):
    lib.export_library('./tmp.so')
    lib = tvm.runtime.load_module('./tmp.so')
    ctx = tvm.gpu() if device == 'gpu' else tvm.cpu()
    gmod = GraphModule(lib['default'](ctx))
    # return lib, ctx, gmod    # explicitly returning the lib module could work
    return None, ctx, gmod  # not working on cuda


if __name__ == '__main__':
    device = sys.argv[1]
    assert device in ['gpu', 'cpu']

    model_url = "".join(
        [
            "https://github.com/onnx/models/raw/",
            "master/vision/classification/mobilenet/model/",
            "mobilenetv2-7.onnx",
        ]
    )
    # https://github.com/onnx/models/blob/master/vision/classification/mobilenet/model/mobilenetv2-7.onnx
    model_path = download_testdata(model_url, "mobilenetv2-7.onnx", module="onnx")
    onnx_model = onnx.load(model_path)
    if device == 'gpu':
        target = 'cuda'
    else:
        target = 'llvm'
    input_name = "input"
    shape_dict = {input_name: [1, 3, 224, 224]}
    relay_module, params = tvm.relay.frontend.from_onnx(onnx_model, shape=shape_dict)
    with tvm.transform.PassContext(opt_level=4):
        lib = tvm.relay.build(relay_module, target, params=params)
    lib, ctx, gmod = save_and_load(lib, device)
    x = tvm.nd.array(np.ones([1, 3, 224, 224], dtype=np.float32), ctx=ctx)
    gmod.set_input(input_name, x)
    gmod.run()

I found that when the returned value of save_and_load is lib, ctx, gmod, this script running on both GPU and CPU could work well. However, when the returned value of save_and_load is None, ctx, gmod, which means I do not return the lib module, which is not further used in my script, it has a strange segmentation fault when running on GPU:

[1]    10796 segmentation fault (core dumped)  python3 reproduce_tvm.py gpu

This means the lib module must be reserved when the target device is cuda. When the targe is llvm, it still works well. I wonder if this is a bug of TVM?

I think gmod uses the the resources of lib. So if you do not explicitly return it, the resources of lib will be released. Maybe we need to add a reference to the lib in gmod so that lib will not be released in advance.

This should have been fixed in the latest main

OK, thank you very much!