HD Standard Models [Performance Issues]

Hi @eqy @tqchen @srkreddy1238

I am running inference on a standard MobilenetV1 on HD images with input size of [1,1200,1920,3] (NHWC) and the performance is as below

Inference time is in msec on 1080ti GPU with CUDA 10.0, cudnn 7.4.2 and TensorRT 5.0

Standard TF (cuda +cudnn) TF + TensorRT (FP32) TVM (without autotuning) Autotuned TVM
31 26 101 75

I let each task for autotuning run for 1000 steps with early stopping = 400 steps.

In my view , Autotune should atleast match standard TF inference time. Do you have any views on this ? One of the reasons could be that TVM is not able to use cuDNN library. How do I ascertain that TVM is using cuDNN backend ?

Also, any other inputs would be appreciated !

Thanks