Is there any benchmark about resnet model on GPU

Hi all, I’m trying out tvm on NVIDIA T4, I used target = 'cuda -libs=cudnn', the performance is far from that using fp32 TensorRT (4 ms vs 2ms), Is that normal?

Precision: fp32 Input shape : 2 * 3 * 224 * 224

Is there any benchmark about TVM performance on GPUs vs TensorRT or PyTorch ?

Is there a big difference using auto-tune with using cudnn? I didn’t try auto-tune as it takes too long.