@tqchen I find the answer now. I thought floating point truncation before like you. I spend much time to investigate it. However, the correct answer is we don’t support CoreML’s SAME padding rightly, when stride is 2, our computation method is not right. The link is: https://apple.github.io/coremltools/coremlspecification/sections/NeuralNetwork.html#samepadding
our output shape height / width is also not right, currently our C++ implementation in convolution.cc / Python’s conv2d_nchw is just the formula of CoreML’s VALID padding (For example, in C++:
oshape[2] = (dshape[2] + param.padding[0] * 2 - dilated_ksize_y) / param.strides[0] + 1;
, we don’t consider CoreML’s SAME padding. But SAME padding is used very commonly and we need to support it. Currently I try to implement it and find that if we only pass just padding_top / padding_left is not enough, we need pass padding_top, padding_bottom, padding_left, padding_right to Symbol.Conv2D and we need make the NNVM know it is the SAME padding / VALID padding.
We had similar PR: https://github.com/dmlc/nnvm/pull/260. However, it is not enough for our purpose. We need know SAME padding of BOTTOM_RIGHT_HEAVY, SAME padding of TOP_LEFT_HEAVY, VALID padding. I would suggest we have one param like this PR named: padding_mode. We have four accept values: NONE(default), SAME_TOP_LEFT_HEAVY, SAME_BOTTOM_RIGHT_HEAVY, VALID. If we get the padding_mode is NONE, then we go the traditional way we do now but we make the user provide 4 input values (top, left, bottom, right), this will be very easy, just extend padding’s default shape from {0, 0} to {0,0,0,0}, ONNX just do like this. If we have non-default value, we handle it. For example, we calculate the padding by ourselves and ignore the padding params. This can be discussed how to be better. I am also in the progress of implementation.
I thought we like the Theano’s way(http://deeplearning.net/software/theano/library/tensor/nnet/conv.html?highlight=conv2d#theano.tensor.nnet.conv.conv2d), :
mode (int or tuple) – One of “valid”, “full”, “half”, an integer, or a tuple where each member is either an integer or a tuple of 2 positive integers.
However, we have C++ code to check the shape and padding’s type is TShape, we can not do it like Theano’s way. If you have any good idea and suggestions, I would like to listen.