TVM generate wrong input type while use TensorRT runtime

wesleyhuang · October 9, 2021, 10:51am

I use TVM import a Tensorflow frozen model, then use TVM TensorRT runtime to generate an TRT engine with TVM_TENSORRT_CACHE_DIR point to the engine cache directory. It generated the engine but with changed input/output tensor names and types. Here are the details. Any suggestions or hints are warmly welcomed. Thanks.

################################################

TVM version: build in ci_gpu docker, with latest/other commits
TensorRT version: 8.0.1.6

Model inputs:

shape_dict = {
     "query_input_ids": [100, 24],
     "doc_input_ids": [100, 128]
}

Brief process procedure:

import TF frozed model

with tf_compat_v1.gfile.GFile(model_path, "rb") as f:
     graph_def = tf_compat_v1.GraphDef()
     graph_def.ParseFromString(f.read())
     graph_def = tf_testing.ProcessGraphDefParam(graph_def)

Import the graph to Relay

mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)

Build TensorRT Target

from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt
gpu_mod, gpu_config = partition_for_tensorrt(mod, params, use_implicit_batch=False)

gpu_target = "cuda"
with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': gpu_config}):
    gpu_lib = relay.build(gpu_mod, target=gpu_target, params=params)

gpu_lib.export_library(gpu_lib_path)

Prepare Data

rng = np.random.default_rng()
q_ids = rng.integers(0, 21127, (batch_size, 24), dtype = np.int32)
d_ids = rng.integers(0, 21127, (batch_size, 128), dtype = np.int32)

TensorRT run to generate the TRT engine

trt_dev = tvm.cuda(0)
trt_loaded_lib = gpu_lib
trt_module = tvm.contrib.graph_executor.GraphModule(trt_loaded_lib['default'](trt_dev))
trt_module.set_input("query_input_ids", q_ids)
trt_module.set_input("doc_input_ids", d_ids)
trt_module.run()

Here are the model inspect info, you can find the mismatch.

Source TF frozen model info

$ polygraphy inspect  model /bert/search-team/bert/bert-frozed-int32.pb --model-type frozen
2021-10-09 10:41:44.992623: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[W] Package: 'tensorflow' version 2.4.2 is installed, but version <2.0 is recommended.
    Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
[I] Loading GraphDef from /bert/search-team/bert/bert-frozed-int32.pb
[I] ==== TensorFlow Graph ====
    ---- 2 Graph Inputs ----
    {query_input_ids:0 [dtype=int32, shape=(None, 24)],
     doc_input_ids:0 [dtype=int32, shape=(None, 128)]}

    ---- 1 Graph Outputs ----
    {Sigmoid:0 [dtype=float32, shape=(None, 1)]}

    ---- 149 Nodes ----

Generated TRT engine info

$ polygraphy inspect  model /bert/TRT-Cache/tvmgen_default_tensorrt_main_17_fp32.plan
[I] Loading bytes from /bert/TRT-Cache/tvmgen_default_tensorrt_main_17_fp32.plan
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.2.0
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine (1 layers)

    ---- 2 Engine Input(s) ----
    {tensorrt_17_i1_0 [dtype=float32, shape=(100, 24, 60)],
     tensorrt_17_i1_1 [dtype=float32, shape=(100, 128, 60)]}

    ---- 1 Engine Output(s) ----
    {tensorrt_output_0 [dtype=float32, shape=(100, 1)]}

    ---- Memory ----
    Device Memory: 251025920 bytes

    ---- 1 Profile(s) (3 Binding(s) Each) ----
    - Profile: 0
        Binding Index: 0 (Input)  [Name: tensorrt_17_i1_0]  | Shapes: min=(100, 24, 60), opt=(100, 24, 60), max=(100, 24, 60)
        Binding Index: 1 (Input)  [Name: tensorrt_17_i1_1]  | Shapes: min=(100, 128, 60), opt=(100, 128, 60), max=(100, 128, 60)
        Binding Index: 2 (Output) [Name: tensorrt_output_0] | Shape: (100, 1)