Motivation
Currently, the TVM codegen’d artifact does not have the ability to express the intermediary memory it requires to perform the inference. This RFC is proposing the initial step in fixing that by enhancing compilation artifacts to be queried for its intermediary workspace/scratch size requirement to execute (fused) operators / sub-regions of the ML model.
This is with alignment with goals for uTVM roadmap : [µTVM] microTVM M2 Roadmap to query the memory required by the intermediates.
Its important that we want the flow to work for BYOC codegens to be able to specify memory required by the intermediates.
Proposal
The plan is to add the following new functions for data structures to be able to successfully query the workspace size each fused operator subgraphs (a.k.a fused operators – these live under a single runtime.Module) and BYOC subgraphs require (there would be a runtime.Module per subgraph). It is important note that we are currently looking to implement the functionality for type “c” and “llvm” that are supported by uTVM. The plan is as follows :
- Introduce calculate_workspace_size() function to tir.analysis
We are planning to introduce a functionality to traverse a tir.PrimFunc’s body to calculate the max workspace in bytes it requires and return it.
- Modify LLVM and C codegen to provide workspace requirements.
The goal here is to query every PrimFunc for intermediary workspace each (fused) operator requires and make it available in the compiled artifacts.
How would this feature be used ?
- We can use this to generate an entry in the memory map of Model Libary Format to indicate intermediary activations pool size. This could be calculated as the max workspace sizes required by AoT TIR PrimFunc, fused_operator_1, … , fused_operator_n, external_func_1, …, external_func_n if they are run sequentially.
- We will be adding an option to use these functionality to generate a static memory buffer (goes to .data) to be used by the stack allocator present in the AoT executor per model, once AoT PR : [AOT] Introducing AOT in TVM by giuseros · Pull Request #7785 · apache/tvm · GitHub lands.
Discussion :
Where should the calculated workspace live ?
B1) runtime.Module : we could have a member containing the workspace it requires for the execution of fused operators inside of it.
This allows BYOC codegens to specify the size as well because as part of the codegen because the workspace size is a compilation artifact. Therefore this option requires only a change in the runtime.Module to carry a member indicating the workspace requirement.
However, this option assumes the functions inside the runtime.Module will run sequentially and does not align well with the fact it is more of “library” of (subgraphs of) operators.
B2) runtime.Module : we could maintain a map of (function→workspace_size) that could be accessed via module.get_workspace_size( func_name ).
This allows BYOC codegens to specify the size for each function as well because as part of the codegen because the workspace sizes for each function are compilation artifacts. Therefore this option requires only a change in the runtime.Module to carry a map indicating the workspace requirement for each function.
This option opens the door for each Executor (Graph/AoT) to consume workspace information to calculate workspace requirement allowing parallel execution of operators if required. Our preference is for this option but would like to hear what the community thinks.
B3) Maintain a map in LoweredOutput :: relay Primitive Function expr → workspace_size (to be passed to BuildOutput → (Graph/AoT)ExecutorFactoryModule → export_model_library)
The current BYOC compilation is relay IRModule → runtime.Module and we might need to change it to give two outputs : relay IRModule → runtime.Module, workspace_size (optional). Then this option will require to plumb this map to exporting of Model Library Format. We think that route will look as LoweredOutput → BuildOutput → (Graph/AoT)ExecutorFactoryModule → export_model_library.
Therefore, this option will not change runtime.Module (to carry an integer indicating the workspace size or func → workspace map) but however we would need to change all the interfaces above.