Co-authored-by: Ming-Long Huang mlhuang@pllab.cs.nthu.edu.tw
Co-authored-by: HMZ mzhuang@pllab.cs.nthu.edu.tw
Hi, we are from the Programming Language Lab, National Tsing Hua University, Taiwan. We have been working on an NNAPI backend for TVM Relax based on the BYOC flow and would like to upstream our current implementation for the codegen and the runtime into TVM.
Motivation
Android Neural Networks API (NNAPI) is a graph-level neural network inference API provided by the Android runtime. Before this RFC, TVM on Android mobile devices mainly relied on OpenCL for GPU acceleration. This RFC aims to add a new codegen and a runtime via the BYOC framework, which enables execution on custom accelerators from SoC vendors on mobile devices.
Rationale and alternatives
Instead of using JSON codegen, the integration can also be implemented using C source codegen.
Prior art
This RFC is a successor of an RFC by us in 2021. The codegen and the runtime have been rewritten from scratch since then to generate and load standardized JSONRuntimeBased
modules instead of C source code.
Current implementation status
We have an implementation with the following components added to the TVM codebase, available in this draft PR that we opened. The major additions made to the code are:
- An NNAPI partition function implemented with pattern matching.
- An NNAPI codegen that serializes Relax IR subgraphs to JSON runtime modules.
- An NNAPI runtime that loads JSON runtime modules and calls API functions to perform model build, compile, and inference.
Currently, the implementation focuses mainly on the use case of CV models and supports offloading the following ops in both float32
and float16
data types.
- Element-wise unary operations (relu, exp, …)
- Element-wise binary operations (add, multiply, …)
- nn.dense
- nn.conv2d
- nn.max_pool2d
Future possibilities
- Add support for quantized data types to cover Relax QNN dialect or Relax quantize/dequantize operators.
- Add support for dynamic shape operands.