[RFC][BYOC] An extended graph partitioning flow

comaniac · March 19, 2020, 6:22pm

The flow concept makes sense. On the other hand, I personally think it might not be good to explicitly insert supported_begin and supported_end to the graph in step 1 and pass it to step 2. According to the RFC, supported_begin and supported_end have basically no limitations, so they can be overlapped in any ways. I can imagine that processing a graph with lots of supported_begin and supported_end could be challenging if you have to remove all of them and insert another set of compiler_begin and compiler_end.

One better way to implement this flow is to keep support_begin/end in a separate data structure instead of the graph. For example, you can implement step 1 as an analysis pass instead of a transform pass. The output of the analysis pass is a list of sets of nodes. Each set represents a region, so you allow a node in more than one sets to represent region overlapping. Then your step 2 accepts that list of sets and transforms the graph by inserting compiler_begin/end. Another way is keeping the region information in each op (or composite function).

Another concern is this flow requires at least 4 traverses, but if we implement each step as a separate Relay pass, each pass will be too restricted. For example, you will never run the step 2 pass along, or you will never run step 1 then step 3 and skip step 2. It seems to me that at least step 2-3 should be put in the same pass.