I would like to understand how to partition a graph imported with the TensorFlowLite frontend.
The use case is the following:
I have an object detection network, and I am compiling the module using the C target and the AOT executor. In the generated C source file, I am able to obtain the main function, which in turn calls all the graph operators. What I would like to do is to separate the entire network into 2 different functions: one that has all the network operators, and one that has all the PostDetectionProcess operators.
Basically, I want to group all the operators created by the convert_detection_postprocess function in the tflite frontend file into one function in the resulting C source code.
Thanks for the detailed writeup. I agree it would be useful to propose a way for people to do this kind of thing.
I think your best bet right now would be to partition the Relay program into two parts either manually (by printing the Relay, editing, and re-parsing) or by writing a Relay pass. I’d suggest the former which may be easier unless you need to automate this process.
You will need to handle some amount of double-buffering to ensure there aren’t any race conditions in concurrent execution. I’m not sure TVM provides anything, certainly not in the C runtime anyway, to help with this.
Is there a good reason to keep them inside a single IRModule (constant reuse, maybe)?
Actually… no, I don’t need to keep them in the same IRModule. Having 2 different IRModules and then compiling then separately would work perfectly for me.
Do you think its easier to partition the IRModule and get 2 different IRModules?
EDIT: I took a look at the partition_conversions function, because I think I could do something similar to obtain the 2 different IRModules. I thought I could use it out-of-the-box, but then noticed my use case is a little different:
As far as I understand, this is what the partition_conversions function expects as a graph:
Basically what I would like to achieve is to partition this into 2 (or 3, if we also partition the first quantize_op in a different IRModule) different modules:
One containing all ops until both dequantize_ops (so this would be an IRModule with 2 outputs)
One containing all the ops afterwards (so 2 input and 4 outputs)
I believe the partition_conversions function is breaking because I have ops after the dequantize_ops, and also because I have 2 of them.
Thanks for the idea! I was able to implement what I needed by partitioning the IRModule in different modules, and taking inspiration from the function I mentioned above.
I have a deadline coming soon, but I will prepare the code and perhaps create a pull request in the near future, if I can make it a little more generic.