CuDNN and AutoTVM benchmark

wda · August 20, 2019, 1:48am

I compare the performance autotvm and cudnn, I found that autotvm can get better performance than cudnn when bathch size =1, but cudnn get better performance than autotvm when batch size =100.
I’m confused that, can you explain it? Did you test autotvm and cudnn performance for different batch size?

wda · August 20, 2019, 2:20am

@eqy @thierry， Can you help me, thank you

kevinthesun · August 22, 2019, 5:39pm

Current conv2d NCHW template on cuda might not be optimized for large batch size. conv2d_hwcn template can give better performance for large batch: https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_hwcn.py However, this template doesn’t have autotvm style yet.

xiaocenxiaocen · August 23, 2019, 5:35am

So the nchw layout template kernel is an image size aware kernel, and the hwcn/chwn layout template kernel is a batch size aware kernel.