#if TRT_VERSION_GE(6, 0, 1)
if (use_implicit_batch_) {
ICHECK(context->execute(batch_size, bindings.data())) << "Running TensorRT failed.";
} else {
ICHECK(context->executeV2(bindings.data())) << "Running TensorRT failed.";
}
#else
ICHECK(context->execute(batch_size, bindings.data())) << "Running TensorRT failed.";
#endif
TVM-TRT uses the sync interface executeV2
to run inference. Async version enqueueV2
supposes to be faster. I guess there is a gap preventing TVM-TRT from using the async version. Can anybody give a hint why don’t use the async interface?
Friendly ping @trevor-m