We want to port the DL models in Relay IR. For that, we want to serialize the Relay IR to disk. Once serialized third-party frameworks, compilers should be able to import those. We want the serialization format to be compact, portable, widely adopted and having well-documented specifications.
We may want to export the optimized Relay after running optimization passes on it. We see a few challenges there which we will talk later in this RFC.
Serialization format should meet below criteria:
- Widely adopted
- Well documented specification
- Import and export support in source and target system
ONNX is the best fit based on the criteria and hence it is chosen.
What is ONNX?
ONNX provides an open-source format for DL models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types having support for inferencing majorly. The ONNX is widely supported and can be found in different frameworks, hardware.
Design /Implementation Approach
Below will be the implementation approach:
- An in-memory Relay module (optimized or unoptimized) will be an input
- Get the topologically sorted list of the Relay nodes
- Convert the Relay nodes to ONNX nodes
- Build ONNX graph and serialize it
Strategy for supporting ONNX opsets
- ONNX operator set evolves over time. Hence, we will have to add support for different versions.
- Initially we will support version 11.
The PR is created with support for a subset of operators. Few of the models from ONNX, MXNet and TFSlim model zoo are tested. For details, limitations and TO-DOs refer to the PR below. https://github.com/apache/incubator-tvm/pull/5052
Support for higher-order features
- ONNX does not have support for adding functions. ONNX does have some predefined functions though. So, we will not be able to map higher-order functions from Relay to ONNX. Proposal to add support for functions was not accepted. https://github.com/onnx/onnx/issues/48
Support for Operator Fusion pass
- We may want to optimize the model using optimization passes before exporting it to ONNX. When we run a Fuse Op pass on Relay, the subgraph of nodes which can be fused together gets wrapped into an inline function. It will be difficult to add support for such inline functions for the reasons listed in the point above. Also, the target runtime should have required support to run fused ops.