Hello,
I made the mistake to review the PR first, and then I found this discussion
So what I understood from the PR (but then I got a bit confused by the terminology in this post) is that you are trying to split your graph so that it can use the hardware resources in a less greedy way then pure BYOC.
So with BYOC you split your graph to use the accelerator (e.g., the FPGA) as much as you can. In theory, if the graph can be entirely offloaded to the accelerator, this is what will happen.
Instead, you are trying to horizontally split your graph in a sort of of pipeline, so that you can always use free resources. For instance if you have a graph A->B->C and two images I1 and I2, you split A->B and C (for instance). So that A->B (for I2) can be offloaded to the CPU if the accelerator is still busy processing C for I1.
Is this what you are trying to achieve?
If so, maybe I would suggest add also your second PR to the one already published, so that we can understand how everything works together.
Also, how do achieve the optimal split? Basically, how do you pick your indices when you try to split your graph?