Skip padding calculations in bitserial convolution

In the definition for bitserial_conv2d, there is an elegant way to do convolution without any for loops. Now, since padding value is always zero, I wonder if there is a way to skip dot products in those areas, so that we can further reduce runtime. Another reason for skipping those calculations is, in XNOR-net, 0 represents -1. This prevents us from having a real 0. Therefore if we don’t skip calculations in padding area, we would get wrong results. I find this task difficult because we would have a variable reduce_axis during convolution, which I don’t know how to implement. Any help is very much appreciated. Thanks!