We want to add submodule extraction. In the future this pass may allow for building post-fusion autotuning support, which will enable more accurate tuning runs. With this pass, we’ll have better task de-duplication support for model zoos + better separation of layers for easier parallel autotuning. Also it would be nice to tune a single layer and an entire model in the exact same way, and this pass helps achieve that.
Here is the current proposed api. We invoke SimplifyInference and FuseOps on the given module before attempting to extract:
- extract_submodules(mod: IRModule) -> List[IRModule]
- extract_hashed_submodules(mod: tvm.IRModule) -> Dict[int, tvm.IRModule], where int indicates a structural hash
Should this be analysis or transform pass?
- A0 analysis - we can treat this as a “read” of existing structures and construct a collection of IRModule.
- A1 transform - most passes here tend to preserve semantic equivalence, which this pass doesn’t uphold. In this case, instead of constructing a collection of IRModule, we may construct a single IRModule containing all the functions.
How should we name this for clarity?
- B0 extract_primitive_tasks
- B1 extract_submodules
- B2 extract_subgraphs
I plan to rewrite the pass in C++ following discussion. Thanks for your comments!