The background is we have RNN language model. Sometimes, we only need to feed in the inputs and update hidden states, and only execute the output layer computation for last input. The output layer is usually very large and time-consuming in a full forward operation.
As GraphRuntime does not provide control-flow logics now, we have to split our model to two parts, and perform the control-flow logics in C/C++ integration code. While we need to share parameters between them to save memory usage.
Proposed Solution:
Solution:
- add “lazy_init_input” in graph’s attributes
"attrs": {
... ...
"lazy_init_input": [
"list_str",
[
"p0"
]
]
}
- allow empty storage entry in SetupStorage, and defer SetupOpExecs
if there is un-initialized data entry - add “set_shared_input” function which takes a NDArray instance, so
the actual storage can be shared - add “setup_operators” function to setup operators
Here is the pull request, https://github.com/dmlc/tvm/pull/3489
Please help to review, thanks!