[BYOC] Use pattern language to create composite functions

comaniac · May 19, 2020, 5:36pm

Since the pattern language has been merged, we are planning to rewrite MergeComposite pass for BYOC with it.

Brief Introduction to Merge Composite Pass

BYOC is designed to help 3rd party codegen and accelerators integrate into TVM. Since 3rd party accelerator may have a specialized and powerful instruction to deal with multiple operators (e.g., conv2d+add+relu), the first pass in BYOC is to fuse those operators to a separate function and annotate the function with the instruction name. As a result, the external codegen can easily replace the entire function with a single instruction.

Reasons to Use Pattern Language

The current implementation of MergeComposite pass accepts a user-specified map such as {'my-inst': graph}, where graph is a small Relay program used as a pattern. More detail use cases can be found in the unit test:

github.com

apache/incubator-tvm/blob/master/tests/python/relay/test_pass_merge_composite.py

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
"""Unit tests for merge composite."""
import tvm
from tvm import relay
from tvm import tir

This file has been truncated. show original

On the other hand, using a Relay program as a pattern causes lots of limitations. Basically all motivations specified in the pattern language ([RFC][Relay] Program Matching for Relay Pt 1: A pattern language) are applicable.

Proposal and POC

Accordingly, we propose to use pattern language for MergeComposite. It’s simple, robust, and more general. For example, I’ve written a POC of MergeComposite in pattern language. As can be seen, we only need less than 100 lines to achieve the same functionality.

gist.github.com

https://gist.github.com/comaniac/1f399dfdfee05a2f7a087c65c21f550c

pattern_lang_composite_pass.py

import tvm
from tvm import relay
from tvm.relay.dataflow_pattern import *

class ExtractArgs(tvm.relay.ExprMutator):
    # Mutate the graph to replace the matched graph input with new vars.
    def __init__(self, mapped_vars):
        super(ExtractArgs, self).__init__()
        self.mapped_vars = set(mapped_vars)
        self.vars = []

This file has been truncated. show original

The above POC includes two solutions.

Pattern.Partition

Thiis is a builtin functionality of pattern language that partitions the matched subgraph to a separate function. However, this builtin function doesn’t allow users to specify function attributes, so we cannot add Composite = inst.

PatternCallback

Another solution uses callback functions to manually create composite functions. The problem with this solution is that we need an extra visit to mutate the Relay graph in order to create the function.

Discussion

While we prefer the first solution that uses pattern.partition(), we need to figure out how to add the composite attribute.

S.1: Enhance Partition

A straightforward approach is enhancing pattern.partition to accept more configurations.

for pattern, inst in patterns:
  out = pattern.partition(out, {'Composite': inst})

S.2: Post-Processing

If we do not want to change the partition, we could use a set know the new generated functions after each partition, and add the attribute to them:

for pattern, inst in patterns:
  curr_funcs = get_func_as_set(out)
  out = pattern.partition(out)
  for func in out.functions:
     if func not in curr_funcs:
         func.with_attr('Composite', inst)

Any other thoughts and comments are welcome.

cc @mbrookhart @zhiics @masahi @mbaret

mbrookhart · May 19, 2020, 5:54pm

I’d be happy to extend the parition pass to accept an optional map of attributes. I think it’s a relatively small change to the pass, if you guys think that would be the easiest option.

Something I thought of over the last few days: If some patterns in your list are larger (say conv-bias-relu), and later patterns in the list are subgraphs of previous patterns (say conv-bias), you may end up with a case where you have a composite function called from inside another composite function, which is probably not the goal.

It might be a good idea to introduce a default “Partitioned” attribute and not re-partition inside the “Partitioned” functions in later passes.

CC @jroesch @tqchen

comaniac · May 19, 2020, 6:05pm

Thanks for pointing out. I missed this part in the original post. Currently MergeComposite pass doesn’t deal with the overlapping issue but let users decide the pattern priority. It means that when performing pattern matching, it should ignore existing composite functions. Is this functionality can also be achieved using pattern.partition()?

mbrookhart · May 19, 2020, 6:14pm

The currently implementation wont, thinking over it more, but I’ll work on an update that adds that functionality and a user-specified set of attributes

masahi · May 19, 2020, 8:48pm

One thing that’s not obvious to me is how we should interprete the result of optional matching. Ideally there should be an easy way to tell which of optionals are actually matched, for example by coming up with an appropriate value for composite attribute.

Otherwise each backend needs to manually examine a fragment it receives by multiple calls to IsOp(call->args[0], ...).

comaniac · May 19, 2020, 8:52pm

Pattern language infra works based on a pattern, so we will know which subgraph is matched by which pattern.

For example, when we use graph = pattern.partition(graph), the new partitioned functions are based on the pattern. As long as we add the composite attribute with this pattern’s name to those functions, we are at the same point as the current MergeComposite pass. In short, the backend will not see any IsOp kinds of stuffs.

masahi · May 19, 2020, 9:11pm

No, it is more complicated than that.

Currently our DNNL backends has two patterns, one for conv + relu and another for conv + bias + relu:

github.com

apache/incubator-tvm/blob/master/python/tvm/relay/op/contrib/dnnl.py#L84-L86


conv2d_bias_relu_pat = ("dnnl.conv2d_bias_relu", make_pattern(with_bias=True))
conv2d_relu_pat = ("dnnl.conv2d_relu", make_pattern(with_bias=False))
dnnl_patterns = [conv2d_bias_relu_pat, conv2d_relu_pat]

Suppose we use optional to represent the two patterns. We would have one pattern named “fused_conv”, let’s say, that can match both conv + bias and conv + bias + relu.

In the current infra, the DNNL backend would receive a composite function with attribute “fused_conv”, and the subgraph is either conv + bias or conv + bias + relu, depending on whether bias is matched. To tell the difference, we need IsOp(relu->args[0], "nn.bias_add").

In my use case, I have four optionals to match “qconv + hswish”, so I’d need four IsOp(...) calls. It would be really nice if the composite attribute tells me which optionals are matched (“conv_relu” or “conv_bias_relu”, rather than “fused_conv”).

A user can associate a list of optional op names for each pattern, and MergeComposite pass can compare the matched fragement with optional names to come up with a good attribute.

comaniac · May 19, 2020, 9:19pm

I see your point, although I personally think in this case we should either 1) specify 4 patterns, or 2) use pattern language to match fused_conv, and let codegen decide whether it is conv+bias or conv+bias+relu when the codegen is parsing a composite function with fused_conv attribute.

On the other hand, the scenario of associating a list of optional op names for each pattern is unclear to me. I imagine you need to specify “sub-patterns” in a pattern and assign a name to each of the sub-pattern. To me this is similar to specify 4 patterns directly.

Feel free to correct me if I lost the track, since pattern language is a brand new feature and I haven’t fully hands on.

mbrookhart · May 19, 2020, 9:44pm

While I wasn’t paying attention to the discussion, I implemented Cody’s original ask here:

@masahi, what do you think of stringing together the types of all of the nodes/names of all of the called ops in the partitioned function, in topological order, and putting that as the default “Paritioned” attribute? I think that would do what you’re asking for, but it would still allow the user to add their own tag if they want.

mbrookhart · May 19, 2020, 9:48pm

We’d still need something like this:

use pattern language to match fused_conv, and let codegen decide whether it is conv+bias or conv+bias+relu when the codegen is parsing a composite function with fused_conv attribute.

But it would be analyzing strings instead of graphs.

comaniac · May 19, 2020, 10:06pm

@mbrookhart the PR is exactly I’m expecting!

I am also fine with the tags in a pattern. In this case, we can keep the infra simple and transparent tag analysis to external codegens.

masahi · May 19, 2020, 10:11pm

Note that it is not 4 patterns, but currently we need 2^4 patterns with unique names. We also need to order them carefully so that a bigger patterns come first. Optional matching is introduced to mitigate this combinatorial mess.

I also realized that “a list of optional names” solution works only for the simplest case: if the pattern is tree-ish shaped or contains the same op twice and only one of them is optional, it doesn’t work.

mbaret · May 19, 2020, 10:12pm

Thanks for this RFC Cody! I fully agree this pass should be re-implemented using the pattern language, the logic was very much a stop-gap solution until more powerful pattern matching appeared.

Do you envisage this having a similar API to the existing pass? In particular it’s important for us to retain the added ‘check’ function to confirm whether a match is valid.

Looking forward slightly, I do wonder whether it will start making sense to instead of creating composite functions just directly insert the annotations around the match. This would give us an opportunity to link up ‘composite’ matches with single ops as it’s currently quite unintuitive that these two things are treated so differently.

masahi · May 19, 2020, 10:29pm

We should have a utility that converts existing relay op based patterns to the new pattern language. Otherwise it would break existing users code and we need to rewrite merge composite test cases manually.

tqchen · May 19, 2020, 10:35pm

Given that the old composite pattern is not yet part of a release, it might easier to directly migrate to the new one so that we don’t have to maintain two variants of pattern languages in the spirit of reducing techinical debts.

mbrookhart · May 19, 2020, 10:50pm

@mbaret Can you point me at the API? I have something like this internal to the Parition Pass to validated Function creation, but I haven’t exposed it.

mbrookhart · May 19, 2020, 10:54pm

In the PR, we talked about adding a pass that can annotate pattern matches instead of partitioning them. I think it’s easy-ish to do. I can start working on it if other people agree this is a good direction to go.

mbaret · May 19, 2020, 11:07pm

I think it would be good to know the pattern matcher has the flexibility to annotate rather than partition. However, that will be a more significant change to the overall BYOC flow so the priority first should be just replicating existing functionality as described in this RFC.

comaniac · May 19, 2020, 11:23pm

Do you envisage this having a similar API to the existing pass? In particular it’s important for us to retain the added ‘check’ function to confirm whether a match is valid.

Good point. We need to evaluate if the check function can be fully integrated into the pattern language.

@mbrookhart FYI, one use case for the check function we are talking about is:

github.com

apache/incubator-tvm/blob/master/tests/python/relay/test_pass_merge_composite.py#L754




def _check_true(extract):
    conv = extract.args[0].args[0]
    return conv.attrs.data_layout == "NHWC"


def _check_false(extract):
    conv = extract.args[0].args[0]
    return conv.attrs.data_layout == "NCHW"


pattern_table_true = [
    ("conv_bias_relu", make_conv_bias_relu_pattern(), _check_true)
]
pattern_table_false = [
    ("conv_bias_relu", make_conv_bias_relu_pattern(), _check_false)
]


result = run_opt_pass(before(), relay.transform.MergeComposite(pattern_table_false))
expected = run_opt_pass(before(), relay.transform.InferType())
assert tvm.ir.structural_equal(result, expected, map_free_vars=True)


result = run_opt_pass(before(), relay.transform.MergeComposite(pattern_table_true))

In short, it provides a packed function to match ops with specific attributes. I imagine this can be done by something like is_op('nn.conv2d').has_attr(...), but I’m not sure how powerful has_attr can achieve.

Looking forward slightly, I do wonder whether it will start making sense to instead of creating composite functions just directly insert the annotations around the match. This would give us an opportunity to link up ‘composite’ matches with single ops as it’s currently quite unintuitive that these two things are treated so differently.

Actually the plan Zhi and I originally have was similar to the one you are talking about. The concept of composite function should be merged to annotation target and the external codegen is in charge of replacing patterns with accelerator specific instructions.

I think it would be good to know the pattern matcher has the flexibility to annotate rather than partition. However, that will be a more significant change to the overall BYOC flow so the priority first should be just replicating existing functionality as described in this RFC.

I agree. We can first make the composite pass with pattern language anyhow. The composite function is being processed by every passes in BYOC infra already and we need to more time to carefully figure out how to use annotations to replace composite functions.

mbrookhart · May 20, 2020, 3:55pm

I see. This is easily doable with the Rewriter API, since you can do an arbitrary callback, but it doesn’t have the niceties of the Parition pass.

The current AttrPattern is only matching Op attributes, but that’s fairly easy to extend.

I think I prefer doing this with the AttrPattern, it’s a simpler API. That being said, an arbitrary callback will always be more flexible. Do you guys have a strong preference toward a callback mechanism or a usecase with more complex logic?