[RFC-109] Add a new backend NNAPI for BYOC

mengshyu · September 20, 2024, 6:03pm

Co-authored-by: Ming-Long Huang mlhuang@pllab.cs.nthu.edu.tw

Co-authored-by: HMZ mzhuang@pllab.cs.nthu.edu.tw

Hi, we are from the Programming Language Lab, National Tsing Hua University, Taiwan. We have been working on an NNAPI backend for TVM Relax based on the BYOC flow and would like to upstream our current implementation for the codegen and the runtime into TVM.

Motivation

Android Neural Networks API (NNAPI) is a graph-level neural network inference API provided by the Android runtime. Before this RFC, TVM on Android mobile devices mainly relied on OpenCL for GPU acceleration. This RFC aims to add a new codegen and a runtime via the BYOC framework, which enables execution on custom accelerators from SoC vendors on mobile devices.

Rationale and alternatives

Instead of using JSON codegen, the integration can also be implemented using C source codegen.

Prior art

This RFC is a successor of an RFC by us in 2021. The codegen and the runtime have been rewritten from scratch since then to generate and load standardized JSONRuntimeBased modules instead of C source code.

Current implementation status

We have an implementation with the following components added to the TVM codebase, available in this draft PR that we opened. The major additions made to the code are:

An NNAPI partition function implemented with pattern matching.
An NNAPI codegen that serializes Relax IR subgraphs to JSON runtime modules.
An NNAPI runtime that loads JSON runtime modules and calls API functions to perform model build, compile, and inference.

Currently, the implementation focuses mainly on the use case of CV models and supports offloading the following ops in both float32 and float16 data types.

Element-wise unary operations (relu, exp, …)
Element-wise binary operations (add, multiply, …)
nn.dense
nn.conv2d
nn.max_pool2d

Future possibilities

Add support for quantized data types to cover Relax QNN dialect or Relax quantize/dequantize operators.
Add support for dynamic shape operands.

mengshyu · September 20, 2024, 9:12pm

Reference implementation

github.com/apache/tvm

[BYOC][NNAPI] Add NNAPI backend for BYOC

apache:main ← mengshyu:fea/nnapi

opened 08:27PM - 18 Sep 24 UTC

mengshyu

+2753 -0

[[RFC-109] Add a new backend NNAPI for BYOC ](https://discuss.tvm.apache.org/t/r…fc-109-add-a-new-backend-nnapi-for-byoc/17717) This PR introduces a new BYOC backend for Android Neural Networks API (NNAPI), enabling execution of neural networks on custom accelerators. This feature adds a new codegen and runtime for NNAPI, supporting operations such as element-wise ops, nn.dense, and nn.conv2d for CNN model with static shape. Co-authored-by: Ming-Long Huang [mlhuang@pllab.cs.nthu.edu.tw](mailto:mlhuang@pllab.cs.nthu.edu.tw) Co-authored-by: HMZ [mzhuang@pllab.cs.nthu.edu.tw](mailto:mzhuang@pllab.cs.nthu.edu.tw)

mengshyu · September 20, 2024, 9:12pm

[RFC] NNAPI Integration via BYOC