An issue encountered using the external codegen infrastructure is that it’s difficult to express many-to-one relationships between Relay and external ops. For example, a quantized convolution gets lowered to 4 Relay ops by the TFLite frontend:
nn.pad
qnn.conv2d
nn.bias_add
qnn.requantize
However, Arm Compute Library directly supports a quantized convolution as expressed by TFLite, so we need to map these 4 Relay operators to a single ACL operator. That means we need to annotate this group of 4 operators as supported even though taken individually they are not. This is different to fusion for performance reasons, where cascading operators together is more efficient.
One approach to this is to write a custom annotation pass that can detect the sequence of operators and annotate accordingly. However, we end up having to write repetitious logic to detect the sequence again when it comes to codegen. It would be preferable if there was a generic mechanism.
An alternative proposal, and the subject of this RFC, is as follows. Introduce a new pass, MergeComposite, which accepts a dictionary of patterns indexed by name. The pass will find these patterns and wrap them in a Relay function marked ‘Composite’ with the name given in the dictionary (eg. acl.qnn_conv2d). The calls to these composite functions can then be treated as the atomic units of annotation.
As an example, the following excerpt would go from this:
%48 = nn.pad(%47, pad_width=[[0, 0], [1, 1], [1, 1], [0, 0]]);
%49 = qnn.conv2d(%48, %acl_input19);
%50 = nn.bias_add(%49, %acl_input20, axis=3);
%51 = qnn.requantize(%50);
%52 = nn.max_pool2d(%51, pool_size=[2, 2], strides=[2, 2], layout="NHWC");
To this:
%51 = fn (%input10, %weight9, %bias9, Composite="acl.qnn_conv2d") {
%48 = nn.pad(%input10, pad_width=[[0, 0], [1, 1], [1, 1], [0, 0]]);
%49 = qnn.conv2d(%48, %weight9);
%50 = nn.bias_add(%49, %bias9, axis=3);
qnn.requantize(%50)
};
%52 = %51(%47, %acl_input19, %acl_input20);
%53 = nn.max_pool2d(%52, pool_size=[2, 2], strides=[2, 2], layout="NHWC");
This is desirable as it allows us to extend the capabilities of a generic operator-by-operator annotation pass to handle the composite case. Once you arrive at a call to a function marked ‘Composite’, that function will be of a known form so a simple static traversal can be used. This can be leveraged both to check whether the function is supported and to generate the external code.
I’ve been careful to avoid using the word ‘fuse’ or ‘fusion’ to explicitly distinguish between this case and that of operator fusion. It could be perfectly valid to fuse a ‘Composite’ function into another Relay operator (eg. the composite qnn_conv2d might be fused with an activation).