I am running inference on a standard MobilenetV1 on HD images with input size of [1,1200,1920,3] (NHWC) and the performance is as below
Inference time is in msec on 1080ti GPU with CUDA 10.0, cudnn 7.4.2 and TensorRT 5.0
Standard TF (cuda +cudnn) | TF + TensorRT (FP32) | TVM (without autotuning) | Autotuned TVM |
---|---|---|---|
31 | 26 | 101 | 75 |
I let each task for autotuning run for 1000 steps with early stopping = 400 steps.
In my view , Autotune should atleast match standard TF inference time. Do you have any views on this ? One of the reasons could be that TVM is not able to use cuDNN library. How do I ascertain that TVM is using cuDNN backend ?
Also, any other inputs would be appreciated !
Thanks