[TensorRT integration error]tvm._ffi.base.TVMError: TVMError: Driver error:

Hello,

I try to use relay TensorRT integration to accelerate the tensorflow inference, reference these relay TensorRT integration tutorials. When I run the program, the following error occurs:

Traceback (most recent call last):
  File "test_tvm.py", line 220, in <module>
    test(arg.checkpoint_dir, arg.style_name, arg.test_dir, arg.if_adjust_brightness)
  File "test_tvm.py", line 127, in test
    gen_module.run(data=tvm.nd.array(x.astype(dtype)))
  File "/home/lulin/work/tvm/python/tvm/contrib/graph_runtime.py", line 206, in run
    self._run()
  File "/home/lulin/work/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: TVMError: Driver error:

The error program as followes:

    x = np.asarray(load_test_data(test_img, img_size))

    shape_dict = {"generator_input": x.shape}
    mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)

    with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': config}):
        lib = relay.build(mod, target=target, params=params)
    lib.export_library('compiled.so')

    dtype = "float32"
    ctx = tvm.gpu(0)
    loaded_lib = tvm.runtime.load_module('compiled.so')
    gen_module = tvm.contrib.graph_runtime.GraphModule(loaded_lib['default'](ctx))

    # gen_module.set_input("generator_input", tvm.nd.array(x.astype(dtype)))

    gen_module.run(data=tvm.nd.array(x.astype(dtype)))
    tvm_output = gen_module.get_output(0, tvm.nd.empty(x.shape, "float32"))

I tried not use TensorRT integration to compile the tensorflow models, refer to this links compile tensorflow model, and it works fine. here is the program without using TensorRT integration:

    x = np.asarray(load_test_data(test_img, img_size))

    shape_dict = {"generator_input": x.shape}
    mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)

    with tvm.transform.PassContext(opt_level=3):
        lib = relay.build(mod, target=target, target_host=target_host, params=params)

    dtype = "float32"
    m = graph_runtime.GraphModule(lib["default"](ctx))

    # set inputs
    m.set_input("generator_input", tvm.nd.array(x.astype(dtype)))

    m.run()

    # get outputs
    tvm_output = m.get_output(0, tvm.nd.empty(x.shape, "float32"))

    predictions = tvm_output.asnumpy()

My environment:

  1. Ubuntu 16.04
  2. Python 3.7.10
  3. TensorFlow-gpu 1.15
  4. TensorRT-7.2.3.4
  5. CUDA 11.0 with cudnn 8.1.0

I follow the NVIDIA official doc to install TensorRT by .tar file, and also add the path to the ~/.bashrc.

The way build TVM with TensorRT is following the official documentation, which is modify the config.cmake file, and then build.

set(USE_TENSORRT_CODEGEN ON)
set(USE_TENSORRT_RUNTIME /home/XXX/TensorRT-7.2.3.4)

Could this error caused by the TensorRT version issue? Anyone have some ideas about this error?

Deepest thanks for your reply!

Looks like you didn’t partition the graph? Specifically, where did your config come from?

Thanks for your reply! I checked my program and actually I did the partition for TensorRT, but somehow it has been deleted accidentally when I post this issue, sorry for my mistake, the error program is like this:

    with tf_compat_v1.gfile.GFile(checkpoint_dir, "rb") as f:
        graph_def = tf_compat_v1.GraphDef()
        graph_def.ParseFromString(f.read(-1))
        graph = tf.import_graph_def(graph_def, name="")
        # Call the utility to import the graph definition into default graph.
        graph_def = tf_testing.ProcessGraphDefParam(graph_def)

    # pre-processing the test_img
    x = np.asarray(load_test_data(test_img, img_size))

    shape_dict = {"generator_input": x.shape}
    print("Tvm frontend processing ... ")
    mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)

    # < ====================== Begin TensorRT integration part ========================== >
    """All ops which are supported by the TensorRT integration will be marked and offload to TensorRT. The rest of
    the ops will go through the regular TVM CUDA compilation and code generation."""
    print("TensorRT partition processing ... ")
    mod, config = partition_for_tensorrt(mod, params)

    print("TensorRT building processing ... ")
    with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': config}):
        lib = relay.build(mod, target=target, params=params)
    # export & load: for local device & remote device mode.
    # lib.export_library('compiled.so')

    dtype = "float32"
    ctx = tvm.gpu(0)
    # loaded_lib = tvm.runtime.load_module('compiled.so')
    gen_module = tvm.contrib.graph_runtime.GraphModule(lib['default'](ctx))

    # gen_module.set_input("generator_input", tvm.nd.array(x.astype(dtype)))

    gen_module.run(data=tvm.nd.array(x.astype(dtype)))
    tvm_output = gen_module.get_output(0, tvm.nd.empty(x.shape, "float32"))
    # < ====================== End TensorRT integration part ========================== >

The config is the return of partition_for_tensorrt(mod, params).

Looking forward to your reply! Thanks a lot!

I see. From the error I can think of two possibilities:

  1. TensorRT is incompatible to your CUDA runtime. Check your CUDA runtime by running nvidia-smi, but since you already posted the CUDA version you used, it might not be the case.
  2. TensorRT integration cannot deal with you model, but I don’t know which model you’re using, I don’t have more clues.

cc @trevor-m

Thanks for your prompt reply!

I suddenly remember the original project need CUDA 10.0, but when I build TVM with TensorRT, I use TensorRT-cuda 11.0 version. Maybe it is the problem comes from?

I think I’ll try to rebuild TVM with TensorRT-10.0 version.

Thanks again for your reply!