[Relay] Correlation between graph json and relay IR

abhikran-quic · January 20, 2022, 12:51pm

This is a follow up on top of my previous question on graph partition.

To generate sub graphs from an existing graph(for debugging purpose), I am curious to get some insight on how can fused operators (in graph.json) correlate with relay IR.

The output of relay.build step gives serialized relay IR.

lib = relay.build(mod, target, target_host=target, params=params)

If I print lib.ir_mod, it shows relay IR which has serial nodes for the functions in fused functions i.e. if the fused function name is tvmgen_default_fused_nn_conv2d_add_nn_relu, the relay IR looks like this

   %3 = nn.conv2d(%1, %2, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3], out_dtype="float16") /* ty=Tensor[(1, 64, 224, 224), float16]     */;
   %4 = cast(%block1_conv1/BiasAdd/ReadVariableOp:0, dtype="float16") /* ty=Tensor[(64), float16] */;
   %5 = nn.bias_add(%3, %4) /* ty=Tensor[(1, 64, 224, 224), float16] */;
   %6 = nn.relu(%5) /* ty=Tensor[(1, 64, 224, 224), float16] */;

I have two questions:

Shouldn’t the output lib.ir_mod contain nodes/functions representing fused operators in graph.json instead of unfused operators ?
Is there a way to get a mapping between fused operators(in graph.json) and relay IR(output if relay.build) ? It would be helpful if there is some debug information to help in correlating relay IR with fused operators.

The reason to get correlation : Graph.json can be viewed by end users and the node_row_ptr in graph.json can be used to mention start and end node of sub graph. If a relation can be established between fused ops and relay IR, we can use compiler_start and compiler_end attributes to partition the graph using GraphPartition() function and compile it for respective hardware.

Requesting for help and thank you in advance!

CC: @comaniac , @csullivan

comaniac · January 20, 2022, 5:46pm

You can still see all operators after fusion; otherwise codegen won’t be able to generate the kernel for fused function. It usually looks like:

%6 = fn(%1, %2, name="tvmgen_default_fused_nn_conv2d_add_nn_relu", primitive=1) {
   %3 = nn.conv2d(%1, %2, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3], out_dtype="float16");
   %4 = cast(%block1_conv1/BiasAdd/ReadVariableOp:0, dtype="float16");
   %5 = nn.bias_add(%3, %4);
   nn.relu(%5)
}
%7 = %6(%in1, %in2)

The fused function will be sent to the codegen and results in a single kernel instead of 4 kernels. The built graph would be just like

return invoke("tvmgen_default_fused_nn_conv2d_add_nn_relu", %in1, %in2)

Currently we don’t have a better solution for it. We still need to “guess” the part in Relay IR from the fused function name.

abhikran-quic · January 24, 2022, 5:17am

Thank you @comaniac for your inputs.

I did some experiments and as desired by me, I was able to generate a subgraph from fused function. Mentioned below is the code:

# Output of relay.build of original graph(mod) is stored in lib
lib = relay.build(mod, target, target_host=target, params=params, mod_name="default")

# Creating a relay IRM for fused node
fused_mod = tvm.IRModule()
fused_func_name = "tvmgen_default_fused_nn_max_pool2d"

# Extract relay function from lib
fused_func = (list(dict(lib.function_metadata[fused_func_name].relay_primfuncs).values())[0])
fused_mod["main"] = relay.Function(fused_func.params, fused_func.body, 
                    fused_func.ret_type, fused_func.type_params)

# Compile fused function to generate subgraph
fused_lib = relay.build(fused_mod, target=target, target_host=target, 
            params=params, mod_name="default")

The aforementioned code works fine for me.

Can you please share your feedback if this looks okay or if there’s a better approach to do this ?

comaniac · January 24, 2022, 6:50am

It seems fine to me if this is just for debugging purpose.