Thank you for your comment and suggestions!
-
Currently, I only considered the case where post-op is in-placable. I just bind the entryID of src[3] and dst[0] to the same memory, the corresponding code can be found in
dnnl_json_runtime.cc:282-286. This solution is not robust enough. I originally planned to judge whether the tensor is in-placable inrun(), and then do the memory binding, but this may not be able to keep zero copy. I think do some modification on TVM memory allocation level can be a general solution. -
I do not notice that “list of available primitive descriptors for convolution and convolution+attributes is not identical”. This finding can change our solution. I will check it up.
-
I am confused about there is no pass to merge two linear operator as well. The pattern is like
%0 = nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */;
%1 = add(%0, meta[relay.Constant][1] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */;
%2 = add(%1, meta[relay.Constant][2] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */;
%3 = nn.relu(%2) /* ty=Tensor[(1, 64, 224, 224), float32] */;
Only when I convert the pattern into
%0 = nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */;
%1 = add(meta[relay.Constant][1] /* ty=Tensor[(64, 1, 1), float32] */, meta[relay.Constant][2] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(64, 1, 1), float32] */;
%2 = add(%0, %1) /* ty=Tensor[(1, 64, 224, 224), float32] */;
%3 = nn.relu(%2) /* ty=Tensor[(1, 64, 224, 224), float32] */;
Then I can apply constant_folding to remove the %1 add. And the result can be:
%0 = nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */;
%1 = add(%0, meta[relay.Constant][1] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */;
%2 = nn.relu(%1) /* ty=Tensor[(1, 64, 224, 224), float32] */;
If you find any pass can handle this case, please share with me!
BTW, this case only happens in VGG_BN series models.