Hi all, I’m trying out tvm on NVIDIA T4, I used target = 'cuda -libs=cudnn'
, the performance is far from that using fp32 TensorRT (4 ms vs 2ms), Is that normal?
Precision: fp32 Input shape : 2 * 3 * 224 * 224
Is there any benchmark about TVM performance on GPUs vs TensorRT or PyTorch ?
Is there a big difference using auto-tune with using cudnn? I didn’t try auto-tune as it takes too long.