[RFC][BYOC] Intel(R) oneDNN Integration

Thank you for your comment and suggestions!

  • Currently, I only considered the case where post-op is in-placable. I just bind the entryID of src[3] and dst[0] to the same memory, the corresponding code can be found in dnnl_json_runtime.cc:282-286. This solution is not robust enough. I originally planned to judge whether the tensor is in-placable in run(), and then do the memory binding, but this may not be able to keep zero copy. I think do some modification on TVM memory allocation level can be a general solution.

  • I do not notice that “list of available primitive descriptors for convolution and convolution+attributes is not identical”. This finding can change our solution. I will check it up.

  • I am confused about there is no pass to merge two linear operator as well. The pattern is like

  %0 = nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */;
  %1 = add(%0, meta[relay.Constant][1] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */;
  %2 = add(%1, meta[relay.Constant][2] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */;
  %3 = nn.relu(%2) /* ty=Tensor[(1, 64, 224, 224), float32] */;

Only when I convert the pattern into

  %0 = nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */;
  %1 = add(meta[relay.Constant][1] /* ty=Tensor[(64, 1, 1), float32] */, meta[relay.Constant][2] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(64, 1, 1), float32] */;
  %2 = add(%0, %1) /* ty=Tensor[(1, 64, 224, 224), float32] */;
  %3 = nn.relu(%2) /* ty=Tensor[(1, 64, 224, 224), float32] */;

Then I can apply constant_folding to remove the %1 add. And the result can be:

  %0 = nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3]) /* ty=Tensor[(1, 64, 224, 224), float32] */;
  %1 = add(%0, meta[relay.Constant][1] /* ty=Tensor[(64, 1, 1), float32] */) /* ty=Tensor[(1, 64, 224, 224), float32] */;
  %2 = nn.relu(%1) /* ty=Tensor[(1, 64, 224, 224), float32] */;

If you find any pass can handle this case, please share with me!

BTW, this case only happens in VGG_BN series models.