[Relay] Generate sub graphs from existing graph

abhikran-quic · November 30, 2021, 1:03pm

Hello All,

I’m trying to generate subgraph from an existing graph. After going through tvm documentation, I found that PartitionGraph() is recommended to split a graph.

My goal is to generate a sub graph from an existing graph to run on backend. I am able to generate subgraph using PartitionGraph() API. However, while providing the partitioned graph as input to relay.build, I’m seeing an error from TVM.

File “tvm/src/relay/backend/graph_executor_codegen.cc”, line 198 TVMError: Check failed: count > 0 (0 vs. 0) : Expr is not existing in storage plan

Mentioned below is the test case to reproduce the error. Please note, this example is modified from test_multiple_outputs function

    def create_graph():
        data = relay.var("data", relay.TensorType((1, 3, 224, 224), "float32"))
        weight = relay.var("weight", relay.TensorType((16, 3, 3, 3), "float32"))
        bn_gamma = relay.var("bn_gamma", relay.TensorType((16,), "float32"))
        bn_beta = relay.var("bn_beta", relay.TensorType((16,), "float32"))
        bn_mean = relay.var("bn_mean", relay.TensorType((16,), "float32"))
        bn_var = relay.var("bn_var", relay.TensorType((16,), "float32"))

        data_cb = compiler_begin(data, "test_target")
        bn_gamma_cb = compiler_begin(bn_gamma, "test_target")
        bn_beta_cb = compiler_begin(bn_beta, "test_target")
        bn_mean_cb = compiler_begin(bn_mean, "test_target")
        bn_var_cb = compiler_begin(bn_var, "test_target")

        w0 = relay.var("w0", relay.TensorType((16, 3, 3, 3), "float32"))
        z0 = relay.add(weight, w0)
        z0_cb = compiler_begin(z0, "test_target")

        conv_o = relay.nn.conv2d(
            data=data_cb, weight=z0_cb, kernel_size=(3, 3), channels=16, padding=(1, 1)
        )

        bn_o = relay.nn.batch_norm(conv_o, bn_gamma_cb, bn_beta_cb, bn_mean_cb, bn_var_cb)
        relu_o = relay.nn.relu(bn_o[0])
        relu_o_ce = compiler_end(relu_o, "test_target")

        bn_omean = bn_o[1]
        rebn_omean_ce = compiler_end(bn_omean, "test_target")
        bn_ovar = bn_o[2]
        bn_ovar_ce = compiler_end(bn_ovar, "test_target")

        dummy_mean_abs = relay.abs(rebn_omean_ce)
        dummy_ovar_abs = relay.abs(bn_ovar_ce)
        dummy_tuple = relay.Tuple((relu_o_ce, dummy_mean_abs, dummy_ovar_abs))

        func = relay.Function([data, weight, bn_gamma, bn_beta, bn_mean, bn_var, w0], dummy_tuple)
        return func

    mod = tvm.IRModule()
    // Create relay graph
    mod["main"] = create_graph()
    // Partition the graph based on annotations in create_graph() function
    partitioned = transform.PartitionGraph()(mod)
    // Create a new IRModule to store subgraph
    new_mod = tvm.IRModule()
    // Check for the other function apart from main
    for func in partitioned.functions.keys():
        func_str = str(func.name_hint)
        if(func_str != "main"):
            new_mod["main"] = partitioned[func_str]
    // Set global symbol attribute to main
    new_mod["main"] = new_mod["main"].with_attr("global_symbol", 'main')
    json, lib, param = relay.build(new_mod, target="llvm", params=None, mod_name="default")

The last step i.e. relay.build is erroring out.

When I check try to debug the problem, I see that memory_plan_ is getting executed correctly in the following line.

Any help would be greatly appreciated. @comaniac @csullivan

Thanks in advance!

comaniac · November 30, 2021, 7:49pm

I feel assigning the partitioned function directly to a new module could be problematic. At least you should check the function attributes: All partitioned functions are marked with kCompiler, indicating that this function will be offloaded to an external backend. You could compare the original main function with the partitioned function and align their attributes (e.g., kCompiler, kPrimitive).

abhikran-quic · December 1, 2021, 5:02pm

Thank you @comaniac for your suggestion!

I was able to make some progress and now I’m able to compile the graph using relay.build. Mentioned below is the snippet of code added.

for func in partitioned.functions.keys():
    #print(func)
    func_str = str(func.name_hint)
    if(func_str != "main"):
        p_func = partitioned[func_str]
        new_mod["main"] = relay.Function(p_func.params, p_func.body, p_func.ret_type, p_func.type_params)

lib = relay.build(new_mod, target="llvm", params=None, mod_name="default")

Here, I’m creating a new relay function by using the attributes of partitioned function.

This is compiling fine without any errors. However, I would like to get your feedback if this approach is recommended to split the graph ?

comaniac · December 1, 2021, 5:38pm

I’m not sure about your ultimate goal so it’s hard to say. If you just want to run a subgraph on TVM supported backends (e.g., CPU/GPU), then this might be the simplest way to do. The downside I could imagine is you may not know the inputs of a partitioned subgraph when executing it, because they are supposed to be intermediate tensors in the original graph.

abhikran-quic · December 2, 2021, 7:04pm

Thank you @comaniac ! I intend to run the subgraph on one of the backends in TVM.

The input to the subgraph can be taken from the “debug_get_output” function of debug graph executor. The output of previous node will serve as input to the next node. So for i’th node, we can dump the output of (i-1)'th node and use that as input to the subgraph.

comaniac · December 3, 2021, 5:35pm

Use debug_get_output itself is not a recommended approach, but if this is the only approach you could leverage, then your approach of splitting graphs seems reasonable.

abhikran-quic · December 6, 2021, 3:10pm

Hi @comaniac,

Is there another alternate approach that is recommended and can be taken?

Also, for my curiosity and knowledge, could you please share some details on the recommended approach to get output of intermediate? Is it using get_node_output function or some other way ?

comaniac · December 6, 2021, 5:56pm

This feature is named “debug” because intermediate tensors are only needed for debugging for the current use cases in TVM, and your use case is not common (i.e., we don’t expect end-users to only build/run a subgraph of a model. The upstream approach BYOC still runs an entire graph and feeds intermediate tensors to subgraphs on the fly) at least for me, so I don’t recall other approaches.

This is what I meant by depending on you use cases (i.e., the purpose of running a subgraph alone). If it’s just for debugging, then the current approach is good enough. If this is to integrate with a new backend, then BYOC or AOT/microTVM could be more suitable. If it is to integrate with other frameworks, then as I mentioned, this is not a current expected behavior, but you are welcome to file a formal RFC to add this feature.

abhikran-quic · December 6, 2021, 6:29pm

Thank you @comaniac for your detailed response! It helps in clarifying a lot of things for me!

You are right. We intend to use the approach mentioned here for debugging runtime problems in a graph. Execution of complete graph on simulator takes a lot of time. Having a subgraph will reduce the time taken to reproduce the problem.

abhikran-quic · February 2, 2022, 1:48pm

Hi @comaniac ,

I have a follow up question on partitioning the graph.

When I traverse a graph and try to generate sub graph which has nodes containing free_vars , I’m not able to understand how to set compiler_begin attribute for parameters. For example, if I have the following IR that I want to partition using PartitionGraph():

  %75 = annotation.compiler_begin(%74, compiler="ccompiler") /* ty=Tensor[(1, 512, 14, 14), float16] */;                                           
  %76 = nn.relu(%75) /* ty=Tensor[(1, 512, 14, 14), float16] */;                                                                                   
  %77 = cast(%block5_conv3/convolution/ReadVariableOp:0, dtype="float16") /* ty=Tensor[(512, 512, 3, 3), float16] */;                              
  %78 = nn.conv2d(%76, %77, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3], out_dtype="float16") /* ty=Tensor[(1, 512, 14, 14), float16]
*/;
  %79 = cast(%block5_conv3/BiasAdd/ReadVariableOp:0, dtype="float16") /* ty=Tensor[(512), float16] */;                                             
  %80 = nn.bias_add(%78, %79) /* ty=Tensor[(1, 512, 14, 14), float16] */;
  %81 = nn.relu(%80) /* ty=Tensor[(1, 512, 14, 14), float16] */;                                                                                   
  %82 = annotation.compiler_end(%81, compiler="ccompiler") /* ty=Tensor[(1, 512, 14, 14), float16] */;

The two cast operators use params which are free_vars: %block5_conv3/convolution/ReadVariableOp:0 and %block5_conv3/BiasAdd/ReadVariableOp:0 respectively.

If I try to partition this graph, I see the following error

File "tvm/src/relay/analysis/annotated_region_set.cc", line 124
Check failed: region.defined() == arg_region.defined() (1 vs. 0) : Arg regions are inconsistent

This is observed because compiler_begin attribute isn’t set for aforementioned free_vars.

Could you please suggest a way to enable compiler_begin attribute for free_vars that are seen within the graph ?

Mentioned below is my annotator which sets the compiler_start and compiler_end attributes based on node number.

@transform.function_pass(opt_level=0)
class WhiteListAnnotatorModified:
    def __init__(self, start_node, end_node, compiler):
        assert isinstance(start_node, int)
        assert isinstance(end_node, int)
        self.start_node = start_node
        self.end_node = end_node
        self.compiler = compiler

    def transform_function(self, func, mod, dev):
        annotator = self
        class Annotator(tvm.relay.ExprMutator):
            def visit_call(self, call):
                op_name = call.op.name
                global counter
                counter = counter+1
                if counter == annotator.start_node:
                    new_args = []
                    for arg in call.args:
                        ann = compiler_begin(super().visit(arg), annotator.compiler)
                        new_args.append(ann)
                    new_call = relay.Call(call.op, new_args, call.attrs, call.type_args)
                    return new_call
                elif counter == annotator.end_node:
                    new_args = []
                    for arg in call.args:
                        new_args.append(super().visit(arg))
                    new_call = relay.Call(call.op, new_args, call.attrs, call.type_args)
                    return compiler_end(new_call, annotator.compiler)
                else:
                    return super().visit_call(call)
        return Annotator().visit(func)

I do observe that there is a relay pass to find free_vars(relay.analysis.free_vars) in an expression but I’m not able to figure out how to associate the free_vars with the respective node( i.e. in the aforementioned example, how to associate %block5_conv3/convolution/ReadVariableOp:0 with cast node)

As always thank you so much for your help

comaniac · February 2, 2022, 9:30pm

The way you insert compiler_begin isn’t correct. Note that compiler_begin should be inserted for ALL inputs instead of just expression. Specifically, your IR should be

  %75 = annotation.compiler_begin(%74, compiler="ccompiler");
  %x0 = annotation.compiler_begin(%block5_conv3/convolution/ReadVariableOp:0, compiler="ccompiler");
  %76 = nn.relu(%75);  
  %77 = cast(%x0, dtype="float16");
  %78 = nn.conv2d(%76, %77, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3], out_dtype="float16");
  %x1 = annotation.compiler_begin(%block5_conv3/BiasAdd/ReadVariableOp:0, compiler="ccompiler");
  %79 = cast(%x1, dtype="float16");
  %80 = nn.bias_add(%78, %79);
  %81 = nn.relu(%80);
  %82 = annotation.compiler_end(%81, compiler="ccompiler");

abhikran-quic · February 2, 2022, 10:35pm

Thanks you @comaniac! I intend to do the same thing i.e. adding complier begin for free_vars.

Could you please help in sharing some reference example which helps in doing so while annotating a precompiled graph? I’ve shared my reference code above which adds complier_begin to the first node but not sure how to do it for the nodes containing free_vars

abhikran-quic:

                    for arg in call.args:
                        ann = compiler_begin(super().visit(arg), annotator.compiler)
                        new_args.append(ann)
                    new_call = relay.Call(call.op, new_args, call.attrs, call.type_args)
                    return new_call

comaniac · February 2, 2022, 11:20pm

This is not a problem of free_vars, but the problem of your algorithm. Your algorithm only checks and annotates the arguments of two call nodes (%76 and %81) in the region. However, this algorithm assumes only the first node in the region accesses the outside tensor, which doesn’t hold in your example.

The logic should be checking all nodes in the region and annotating an argument from outside of the region.

popojames · March 1, 2022, 11:37pm

Hello @abhikran-quic,

Thanks for raising this post, I am also interested in generating some subgraphs from an existing graph to run on different CPU/accelerators.

In my previous work, I have followed @hjiang’s old post to split the existing graph into N different subgraphs.

github.com

huajsj/tvm/blob/47c5cc24dc01248b0c1b7ea76cb3ff2806445888/tests/python/relay/test_pipeline_executor.py#L25


# specific language governing permissions and limitations
# under the License.


import numpy as np
import tvm
import tvm.testing
from tvm import relay
from tvm.relay import transform
from tvm.contrib import graph_executor, pipeline_executor


"""
Split graph into a serial of sbgraph.
"""
def pipeline_graph(expr, indices):
    """Split Graph Into A Group Of Subgraph
    Parameters
    ----------
    expr : tvm.relay.Expr
    indices : Array[int]
    Returns
    -------

However, as my previous post mentioned, I found out each subgraph can only have one global output, wherein is the last operation.

When I check the data dependency, I notice there are another dependencies other than the last operation:

For example, in my post, %42 of the first subgraph → %x1: Tensor[(1, 1, 1, 128), float32] of the second subgraph. This operation is constant that goes to every layer. (e.g, %19 in second subgraph) However, for this operation, I cannot send the data dependency to the next subgraph since it is not registered as global output in the first subgraph.

Thus, I am wondering is it possible for that user can we register operations in Relay IR as new outputs to read them out (or send them to another subgraph, in my case).

Moreover, can @abhikran-quic share more information regarding which documents do you follow or how do you use the PartitionGraph function?

Thanks for your help

cc @comaniac

abhikran-quic · March 2, 2022, 3:49pm

Hi @popojames ,

To understand about GraphPartition algorithm, I found the following tests to be useful:

github.com

apache/tvm/blob/main/tests/python/relay/test_pass_partition_graph.py

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
"""Unit tests for graph partitioning."""
# pylint: disable=not-callable
import os
import sys

This file has been truncated. show original

github.com

apache/tvm/blob/main/tests/python/relay/test_pass_annotate_target.py

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
"""Unit tests for annotating external targets."""
import os
import sys
import numpy as np

This file has been truncated. show original

Regarding your question about relay IR , I would request @comaniac to share some ideas.