[Relay][AOT][uTVM] Partitioning a graph to obtain 2 different functions in the generated C code

fPecc · September 8, 2022, 3:04pm

Hello all!

I would like to understand how to partition a graph imported with the TensorFlowLite frontend.

The use case is the following:

I have an object detection network, and I am compiling the module using the C target and the AOT executor. In the generated C source file, I am able to obtain the main function, which in turn calls all the graph operators. What I would like to do is to separate the entire network into 2 different functions: one that has all the network operators, and one that has all the PostDetectionProcess operators.

Basically, I want to group all the operators created by the convert_detection_postprocess function in the tflite frontend file into one function in the resulting C source code.

Thanks!

fPecc · September 8, 2022, 3:44pm

Perhaps some more explanation. This is what the generated code looks like now:

main(...)
{
   op1(...)
   op2(...)
   ...
   opN(...)
}

And this is what I would like to obtain (assuming the post detection process starts at the N-5 operator):

net(...)
{
   op1(...)
   op2(...)
   ...
   opN-6(...)
}

detection(...)
{
   opN-5(...)
   opN-4(...)
   ...
   opN(...)
}

main(...)
{
   net(...)
   detection(...)
}

My main goal would be to be able to call this 2 different functions (net and detection) from 2 different threads in my embedded OS.

areusch · September 8, 2022, 9:20pm

Hi @fPecc,

Thanks for the detailed writeup. I agree it would be useful to propose a way for people to do this kind of thing.

I think your best bet right now would be to partition the Relay program into two parts either manually (by printing the Relay, editing, and re-parsing) or by writing a Relay pass. I’d suggest the former which may be easier unless you need to automate this process.

You will need to handle some amount of double-buffering to ensure there aren’t any race conditions in concurrent execution. I’m not sure TVM provides anything, certainly not in the C runtime anyway, to help with this.

Is there a good reason to keep them inside a single IRModule (constant reuse, maybe)?

Thanks, Andrew

fPecc · September 9, 2022, 7:12am

Hi Andrew!

Actually… no, I don’t need to keep them in the same IRModule. Having 2 different IRModules and then compiling then separately would work perfectly for me.

Do you think its easier to partition the IRModule and get 2 different IRModules?

EDIT: I took a look at the partition_conversions function, because I think I could do something similar to obtain the 2 different IRModules. I thought I could use it out-of-the-box, but then noticed my use case is a little different:

As far as I understand, this is what the partition_conversions function expects as a graph:

quantize_op
|
op_int_1
|
op_int_2
|
...
|
dequantize_op

And it creates 3 IRModules, which are then combined into one with 4 functions in it:

A quantize function, which calls the quantize_op
A quantized_main function, which calls all the quantized op_int_*
A dequantize function, which calls only the dequantize_op
A main function which calls the 3 functions in the correct order, as if the partition_conversions transformation was not executed.

My use case is different because my graph looks something like this (of course, this is a simplified version):

quantize_op
|
op_int_1
|
op_int_2
|       \
|        \
op_int_3   op_int_4
|               \
...              ...
|                   \
dequantize_op        dequantize_op
|                    |
op_float_1           op_float_2
         \          /
      non_max_suppression
     /     /      \     \
out_1   out_2    out_3   out_4

Basically what I would like to achieve is to partition this into 2 (or 3, if we also partition the first quantize_op in a different IRModule) different modules:

One containing all ops until both dequantize_ops (so this would be an IRModule with 2 outputs)
One containing all the ops afterwards (so 2 input and 4 outputs)

I believe the partition_conversions function is breaking because I have ops after the dequantize_ops, and also because I have 2 of them.

fPecc · September 15, 2022, 9:07am

Hi @areusch ,

Thanks for the idea! I was able to implement what I needed by partitioning the IRModule in different modules, and taking inspiration from the function I mentioned above.

I have a deadline coming soon, but I will prepare the code and perhaps create a pull request in the near future, if I can make it a little more generic.