I use TVM import a Tensorflow frozen model, then use TVM TensorRT runtime to generate an TRT engine with TVM_TENSORRT_CACHE_DIR point to the engine cache directory. It generated the engine but with changed input/output tensor names and types. Here are the details. Any suggestions or hints are warmly welcomed. Thanks.
################################################
- TVM version: build in ci_gpu docker, with latest/other commits
- TensorRT version: 8.0.1.6
Model inputs:
shape_dict = {
"query_input_ids": [100, 24],
"doc_input_ids": [100, 128]
}
Brief process procedure:
import TF frozed model
with tf_compat_v1.gfile.GFile(model_path, "rb") as f:
graph_def = tf_compat_v1.GraphDef()
graph_def.ParseFromString(f.read())
graph_def = tf_testing.ProcessGraphDefParam(graph_def)
Import the graph to Relay
mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)
Build TensorRT Target
from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt
gpu_mod, gpu_config = partition_for_tensorrt(mod, params, use_implicit_batch=False)
gpu_target = "cuda"
with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': gpu_config}):
gpu_lib = relay.build(gpu_mod, target=gpu_target, params=params)
gpu_lib.export_library(gpu_lib_path)
Prepare Data
rng = np.random.default_rng()
q_ids = rng.integers(0, 21127, (batch_size, 24), dtype = np.int32)
d_ids = rng.integers(0, 21127, (batch_size, 128), dtype = np.int32)
TensorRT run to generate the TRT engine
trt_dev = tvm.cuda(0)
trt_loaded_lib = gpu_lib
trt_module = tvm.contrib.graph_executor.GraphModule(trt_loaded_lib['default'](trt_dev))
trt_module.set_input("query_input_ids", q_ids)
trt_module.set_input("doc_input_ids", d_ids)
trt_module.run()
Here are the model inspect info, you can find the mismatch.
Source TF frozen model info
$ polygraphy inspect model /bert/search-team/bert/bert-frozed-int32.pb --model-type frozen
2021-10-09 10:41:44.992623: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[W] Package: 'tensorflow' version 2.4.2 is installed, but version <2.0 is recommended.
Consider installing the recommended version or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to do so automatically.
[I] Loading GraphDef from /bert/search-team/bert/bert-frozed-int32.pb
[I] ==== TensorFlow Graph ====
---- 2 Graph Inputs ----
{query_input_ids:0 [dtype=int32, shape=(None, 24)],
doc_input_ids:0 [dtype=int32, shape=(None, 128)]}
---- 1 Graph Outputs ----
{Sigmoid:0 [dtype=float32, shape=(None, 1)]}
---- 149 Nodes ----
Generated TRT engine info
$ polygraphy inspect model /bert/TRT-Cache/tvmgen_default_tensorrt_main_17_fp32.plan
[I] Loading bytes from /bert/TRT-Cache/tvmgen_default_tensorrt_main_17_fp32.plan
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.2.0
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine (1 layers)
---- 2 Engine Input(s) ----
{tensorrt_17_i1_0 [dtype=float32, shape=(100, 24, 60)],
tensorrt_17_i1_1 [dtype=float32, shape=(100, 128, 60)]}
---- 1 Engine Output(s) ----
{tensorrt_output_0 [dtype=float32, shape=(100, 1)]}
---- Memory ----
Device Memory: 251025920 bytes
---- 1 Profile(s) (3 Binding(s) Each) ----
- Profile: 0
Binding Index: 0 (Input) [Name: tensorrt_17_i1_0] | Shapes: min=(100, 24, 60), opt=(100, 24, 60), max=(100, 24, 60)
Binding Index: 1 (Input) [Name: tensorrt_17_i1_1] | Shapes: min=(100, 128, 60), opt=(100, 128, 60), max=(100, 128, 60)
Binding Index: 2 (Output) [Name: tensorrt_output_0] | Shape: (100, 1)