Graph partitioning can be divided into two parts:
partitioning, it partitions the graph into subgraphs that are suitable for different devices. There are two options for graph level optimizations after we obtain the partitioned subgraphs.
a) Do graph level opts, such as fusion and precompute, on each subgraph, and then replace the original subgraph with its optimized counterpart.
b) Group the subgraphs together and perform graph level opts on a “single” graph, then split the subgraphs and replace the original ones.
It seems there is not much difference between these two methods. The second one might be cleaner.
Nodes can be annotated with context info during partitioning. Data copy operators can be inserted when the subgraphs are being replaced.
compilation and runtime, we have agreed that compile the subgraphs in a single graph would be more convenient because it eases the work of reconstructing the subgraphs and keeps the runtime cleaner.
Some concerns about how much modification of runtime is required.
- The current build API only takes one target. We need to modify this API or add a new one to adapt multiple contexts.
- We will need to load modules with different libs. Some work should be done for runtime to make it work.
Any comments and suggestions are greatly appreciated.