[Relay] Generate sub graphs from existing graph

Hello All,

I’m trying to generate subgraph from an existing graph. After going through tvm documentation, I found that PartitionGraph() is recommended to split a graph.

My goal is to generate a sub graph from an existing graph to run on backend. I am able to generate subgraph using PartitionGraph() API. However, while providing the partitioned graph as input to relay.build, I’m seeing an error from TVM.

File “tvm/src/relay/backend/graph_executor_codegen.cc”, line 198 TVMError: Check failed: count > 0 (0 vs. 0) : Expr is not existing in storage plan

Mentioned below is the test case to reproduce the error. Please note, this example is modified from test_multiple_outputs function

    def create_graph():
        data = relay.var("data", relay.TensorType((1, 3, 224, 224), "float32"))
        weight = relay.var("weight", relay.TensorType((16, 3, 3, 3), "float32"))
        bn_gamma = relay.var("bn_gamma", relay.TensorType((16,), "float32"))
        bn_beta = relay.var("bn_beta", relay.TensorType((16,), "float32"))
        bn_mean = relay.var("bn_mean", relay.TensorType((16,), "float32"))
        bn_var = relay.var("bn_var", relay.TensorType((16,), "float32"))

        data_cb = compiler_begin(data, "test_target")
        bn_gamma_cb = compiler_begin(bn_gamma, "test_target")
        bn_beta_cb = compiler_begin(bn_beta, "test_target")
        bn_mean_cb = compiler_begin(bn_mean, "test_target")
        bn_var_cb = compiler_begin(bn_var, "test_target")

        w0 = relay.var("w0", relay.TensorType((16, 3, 3, 3), "float32"))
        z0 = relay.add(weight, w0)
        z0_cb = compiler_begin(z0, "test_target")

        conv_o = relay.nn.conv2d(
            data=data_cb, weight=z0_cb, kernel_size=(3, 3), channels=16, padding=(1, 1)
        )

        bn_o = relay.nn.batch_norm(conv_o, bn_gamma_cb, bn_beta_cb, bn_mean_cb, bn_var_cb)
        relu_o = relay.nn.relu(bn_o[0])
        relu_o_ce = compiler_end(relu_o, "test_target")

        bn_omean = bn_o[1]
        rebn_omean_ce = compiler_end(bn_omean, "test_target")
        bn_ovar = bn_o[2]
        bn_ovar_ce = compiler_end(bn_ovar, "test_target")

        dummy_mean_abs = relay.abs(rebn_omean_ce)
        dummy_ovar_abs = relay.abs(bn_ovar_ce)
        dummy_tuple = relay.Tuple((relu_o_ce, dummy_mean_abs, dummy_ovar_abs))

        func = relay.Function([data, weight, bn_gamma, bn_beta, bn_mean, bn_var, w0], dummy_tuple)
        return func

    mod = tvm.IRModule()
    // Create relay graph
    mod["main"] = create_graph()
    // Partition the graph based on annotations in create_graph() function
    partitioned = transform.PartitionGraph()(mod)
    // Create a new IRModule to store subgraph
    new_mod = tvm.IRModule()
    // Check for the other function apart from main
    for func in partitioned.functions.keys():
        func_str = str(func.name_hint)
        if(func_str != "main"):
            new_mod["main"] = partitioned[func_str]
    // Set global symbol attribute to main
    new_mod["main"] = new_mod["main"].with_attr("global_symbol", 'main')
    json, lib, param = relay.build(new_mod, target="llvm", params=None, mod_name="default")

The last step i.e. relay.build is erroring out.

When I check try to debug the problem, I see that memory_plan_ is getting executed correctly in the following line.

Any help would be greatly appreciated. @comaniac @csullivan

Thanks in advance! :slight_smile:

I feel assigning the partitioned function directly to a new module could be problematic. At least you should check the function attributes: All partitioned functions are marked with kCompiler, indicating that this function will be offloaded to an external backend. You could compare the original main function with the partitioned function and align their attributes (e.g., kCompiler, kPrimitive).

1 Like

Thank you @comaniac for your suggestion!

I was able to make some progress and now I’m able to compile the graph using relay.build. Mentioned below is the snippet of code added.

for func in partitioned.functions.keys():
    #print(func)
    func_str = str(func.name_hint)
    if(func_str != "main"):
        p_func = partitioned[func_str]
        new_mod["main"] = relay.Function(p_func.params, p_func.body, p_func.ret_type, p_func.type_params)

lib = relay.build(new_mod, target="llvm", params=None, mod_name="default")

Here, I’m creating a new relay function by using the attributes of partitioned function.

This is compiling fine without any errors. However, I would like to get your feedback if this approach is recommended to split the graph ?

I’m not sure about your ultimate goal so it’s hard to say. If you just want to run a subgraph on TVM supported backends (e.g., CPU/GPU), then this might be the simplest way to do. The downside I could imagine is you may not know the inputs of a partitioned subgraph when executing it, because they are supposed to be intermediate tensors in the original graph.

Thank you @comaniac ! I intend to run the subgraph on one of the backends in TVM.

The input to the subgraph can be taken from the “debug_get_output” function of debug graph executor. The output of previous node will serve as input to the next node. So for i’th node, we can dump the output of (i-1)'th node and use that as input to the subgraph.

Use debug_get_output itself is not a recommended approach, but if this is the only approach you could leverage, then your approach of splitting graphs seems reasonable.

Hi @comaniac,

Is there another alternate approach that is recommended and can be taken?

Also, for my curiosity and knowledge, could you please share some details on the recommended approach to get output of intermediate? Is it using get_node_output function or some other way ?

This feature is named “debug” because intermediate tensors are only needed for debugging for the current use cases in TVM, and your use case is not common (i.e., we don’t expect end-users to only build/run a subgraph of a model. The upstream approach BYOC still runs an entire graph and feeds intermediate tensors to subgraphs on the fly) at least for me, so I don’t recall other approaches.

This is what I meant by depending on you use cases (i.e., the purpose of running a subgraph alone). If it’s just for debugging, then the current approach is good enough. If this is to integrate with a new backend, then BYOC or AOT/microTVM could be more suitable. If it is to integrate with other frameworks, then as I mentioned, this is not a current expected behavior, but you are welcome to file a formal RFC to add this feature.

1 Like

Thank you @comaniac for your detailed response! It helps in clarifying a lot of things for me! :slight_smile:

You are right. We intend to use the approach mentioned here for debugging runtime problems in a graph. Execution of complete graph on simulator takes a lot of time. Having a subgraph will reduce the time taken to reproduce the problem.

Hi @comaniac ,

I have a follow up question on partitioning the graph.

When I traverse a graph and try to generate sub graph which has nodes containing free_vars , I’m not able to understand how to set compiler_begin attribute for parameters. For example, if I have the following IR that I want to partition using PartitionGraph():

  %75 = annotation.compiler_begin(%74, compiler="ccompiler") /* ty=Tensor[(1, 512, 14, 14), float16] */;                                           
  %76 = nn.relu(%75) /* ty=Tensor[(1, 512, 14, 14), float16] */;                                                                                   
  %77 = cast(%block5_conv3/convolution/ReadVariableOp:0, dtype="float16") /* ty=Tensor[(512, 512, 3, 3), float16] */;                              
  %78 = nn.conv2d(%76, %77, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3], out_dtype="float16") /* ty=Tensor[(1, 512, 14, 14), float16]
*/;
  %79 = cast(%block5_conv3/BiasAdd/ReadVariableOp:0, dtype="float16") /* ty=Tensor[(512), float16] */;                                             
  %80 = nn.bias_add(%78, %79) /* ty=Tensor[(1, 512, 14, 14), float16] */;
  %81 = nn.relu(%80) /* ty=Tensor[(1, 512, 14, 14), float16] */;                                                                                   
  %82 = annotation.compiler_end(%81, compiler="ccompiler") /* ty=Tensor[(1, 512, 14, 14), float16] */; 

The two cast operators use params which are free_vars: %block5_conv3/convolution/ReadVariableOp:0 and %block5_conv3/BiasAdd/ReadVariableOp:0 respectively.

If I try to partition this graph, I see the following error

File "tvm/src/relay/analysis/annotated_region_set.cc", line 124
Check failed: region.defined() == arg_region.defined() (1 vs. 0) : Arg regions are inconsistent

This is observed because compiler_begin attribute isn’t set for aforementioned free_vars.

Could you please suggest a way to enable compiler_begin attribute for free_vars that are seen within the graph ?

Mentioned below is my annotator which sets the compiler_start and compiler_end attributes based on node number.

@transform.function_pass(opt_level=0)
class WhiteListAnnotatorModified:
    def __init__(self, start_node, end_node, compiler):
        assert isinstance(start_node, int)
        assert isinstance(end_node, int)
        self.start_node = start_node
        self.end_node = end_node
        self.compiler = compiler

    def transform_function(self, func, mod, dev):
        annotator = self
        class Annotator(tvm.relay.ExprMutator):
            def visit_call(self, call):
                op_name = call.op.name
                global counter
                counter = counter+1
                if counter == annotator.start_node:
                    new_args = []
                    for arg in call.args:
                        ann = compiler_begin(super().visit(arg), annotator.compiler)
                        new_args.append(ann)
                    new_call = relay.Call(call.op, new_args, call.attrs, call.type_args)
                    return new_call
                elif counter == annotator.end_node:
                    new_args = []
                    for arg in call.args:
                        new_args.append(super().visit(arg))
                    new_call = relay.Call(call.op, new_args, call.attrs, call.type_args)
                    return compiler_end(new_call, annotator.compiler)
                else:
                    return super().visit_call(call)
        return Annotator().visit(func)

I do observe that there is a relay pass to find free_vars(relay.analysis.free_vars) in an expression but I’m not able to figure out how to associate the free_vars with the respective node( i.e. in the aforementioned example, how to associate %block5_conv3/convolution/ReadVariableOp:0 with cast node)

As always thank you so much for your help :slight_smile:

The way you insert compiler_begin isn’t correct. Note that compiler_begin should be inserted for ALL inputs instead of just expression. Specifically, your IR should be

  %75 = annotation.compiler_begin(%74, compiler="ccompiler");
  %x0 = annotation.compiler_begin(%block5_conv3/convolution/ReadVariableOp:0, compiler="ccompiler");
  %76 = nn.relu(%75);  
  %77 = cast(%x0, dtype="float16");
  %78 = nn.conv2d(%76, %77, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3], out_dtype="float16");
  %x1 = annotation.compiler_begin(%block5_conv3/BiasAdd/ReadVariableOp:0, compiler="ccompiler");
  %79 = cast(%x1, dtype="float16");
  %80 = nn.bias_add(%78, %79);
  %81 = nn.relu(%80);
  %82 = annotation.compiler_end(%81, compiler="ccompiler");

Thanks you @comaniac! I intend to do the same thing i.e. adding complier begin for free_vars.

Could you please help in sharing some reference example which helps in doing so while annotating a precompiled graph? I’ve shared my reference code above which adds complier_begin to the first node but not sure how to do it for the nodes containing free_vars

This is not a problem of free_vars, but the problem of your algorithm. Your algorithm only checks and annotates the arguments of two call nodes (%76 and %81) in the region. However, this algorithm assumes only the first node in the region accesses the outside tensor, which doesn’t hold in your example.

The logic should be checking all nodes in the region and annotating an argument from outside of the region.

Hello @abhikran-quic,

Thanks for raising this post, I am also interested in generating some subgraphs from an existing graph to run on different CPU/accelerators.

In my previous work, I have followed @hjiang’s old post to split the existing graph into N different subgraphs.

However, as my previous post mentioned, I found out each subgraph can only have one global output, wherein is the last operation.

When I check the data dependency, I notice there are another dependencies other than the last operation:

For example, in my post, %42 of the first subgraph → %x1: Tensor[(1, 1, 1, 128), float32] of the second subgraph. This operation is constant that goes to every layer. (e.g, %19 in second subgraph) However, for this operation, I cannot send the data dependency to the next subgraph since it is not registered as global output in the first subgraph.

Thus, I am wondering is it possible for that user can we register operations in Relay IR as new outputs to read them out (or send them to another subgraph, in my case).

Moreover, can @abhikran-quic share more information regarding which documents do you follow or how do you use the PartitionGraph function?

Thanks for your help :slight_smile:

cc @comaniac

Hi @popojames ,

To understand about GraphPartition algorithm, I found the following tests to be useful:

Regarding your question about relay IR , I would request @comaniac to share some ideas.

1 Like