Hi, I just found a very strange bug when loading the compiled TVM library file. Here’s the demo code:
import sys
import numpy as np
import onnx
import tvm
import tvm.relay as relay
from tvm.contrib.download import download_testdata
from tvm.contrib.graph_runtime import GraphModule
def save_and_load(lib, device):
lib.export_library('./tmp.so')
lib = tvm.runtime.load_module('./tmp.so')
ctx = tvm.gpu() if device == 'gpu' else tvm.cpu()
gmod = GraphModule(lib['default'](ctx))
# return lib, ctx, gmod # explicitly returning the lib module could work
return None, ctx, gmod # not working on cuda
if __name__ == '__main__':
device = sys.argv[1]
assert device in ['gpu', 'cpu']
model_url = "".join(
[
"https://github.com/onnx/models/raw/",
"master/vision/classification/mobilenet/model/",
"mobilenetv2-7.onnx",
]
)
# https://github.com/onnx/models/blob/master/vision/classification/mobilenet/model/mobilenetv2-7.onnx
model_path = download_testdata(model_url, "mobilenetv2-7.onnx", module="onnx")
onnx_model = onnx.load(model_path)
if device == 'gpu':
target = 'cuda'
else:
target = 'llvm'
input_name = "input"
shape_dict = {input_name: [1, 3, 224, 224]}
relay_module, params = tvm.relay.frontend.from_onnx(onnx_model, shape=shape_dict)
with tvm.transform.PassContext(opt_level=4):
lib = tvm.relay.build(relay_module, target, params=params)
lib, ctx, gmod = save_and_load(lib, device)
x = tvm.nd.array(np.ones([1, 3, 224, 224], dtype=np.float32), ctx=ctx)
gmod.set_input(input_name, x)
gmod.run()
I found that when the returned value of save_and_load
is lib, ctx, gmod
, this script running on both GPU and CPU could work well. However, when the returned value of save_and_load
is None, ctx, gmod
, which means I do not return the lib module, which is not further used in my script, it has a strange segmentation fault when running on GPU:
[1] 10796 segmentation fault (core dumped) python3 reproduce_tvm.py gpu
This means the lib module must be reserved when the target device is cuda. When the targe is llvm, it still works well. I wonder if this is a bug of TVM?