I am hoping that the upcomming autotvm integration for cuda backend will remove the batch 1 limitation. Winograd convolution is coming as well. For winograd, supporting batch > 1 is trivial and the larger the batch size , the better.
@merrymercy Can you comment?