The flow concept makes sense. On the other hand, I personally think it might not be good to explicitly insert supported_begin
and supported_end
to the graph in step 1 and pass it to step 2. According to the RFC, supported_begin
and supported_end
have basically no limitations, so they can be overlapped in any ways. I can imagine that processing a graph with lots of supported_begin
and supported_end
could be challenging if you have to remove all of them and insert another set of compiler_begin
and compiler_end
.
One better way to implement this flow is to keep support_begin/end
in a separate data structure instead of the graph. For example, you can implement step 1 as an analysis pass instead of a transform pass. The output of the analysis pass is a list of sets of nodes. Each set represents a region, so you allow a node in more than one sets to represent region overlapping. Then your step 2 accepts that list of sets and transforms the graph by inserting compiler_begin/end
. Another way is keeping the region information in each op (or composite function).
Another concern is this flow requires at least 4 traverses, but if we implement each step as a separate Relay pass, each pass will be too restricted. For example, you will never run the step 2 pass along, or you will never run step 1 then step 3 and skip step 2. It seems to me that at least step 2-3 should be put in the same pass.