CuDNN and AutoTVM benchmark

I compare the performance autotvm and cudnn, I found that autotvm can get better performance than cudnn when bathch size =1, but cudnn get better performance than autotvm when batch size =100.
I’m confused that, can you explain it? Did you test autotvm and cudnn performance for different batch size?

@eqy @thierry, Can you help me, thank you

Current conv2d NCHW template on cuda might not be optimized for large batch size. conv2d_hwcn template can give better performance for large batch: https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_hwcn.py However, this template doesn’t have autotvm style yet.

So the nchw layout template kernel is an image size aware kernel, and the hwcn/chwn layout template kernel is a batch size aware kernel.