Summary
Support Versilicon’s NPU with BYOC framework. Bring the TVM ecosystem to my customers.
Motivation
Verislicon’s NPU applied on the edge broadly with many SoC vendors. Meanwhile, more and more customers want to build their application with TVM, so we want to bring this capability for them.
Guide-level explanation
NA
Reference-level explanation
Two major components in our device implementation. Firstly, we implemented code-gen which we learned from ARM ethos-N support. In this part, we visit all the IR nodes and gathered tensor information. Once all the information is extracted from the TVM framework, we create our own model with TIM-VX APIs, then apply layout inference if the original model is from tflite. The layout inference is designed to convert layout from NHWC to NCHW, since our low-level software require NCHW format. Once the graph contructed and converted to the correct layout, we compile it into binary format in mermory, we called it NBG - network binary graph. With this NBG memory, we can deserialize it to dynamic so file by TVM framework.
Class TensorMakerImpl is responsible for gathering tensor information - shape/datatype - for futher tensor creating. Class GraphMakerImpl will create graph/tensor/node with tim-vx apis.
The second part is about to run this NBG file in the runtime. This part is quite simple, we just need to take care of the order for the input and output.
Drawbacks
Rationale and alternatives
With precompiled model into NBG format, it’s easy to deploy in the production environment. If we need to add new operation support, we just need to update the Code-Gen part, there is no update required for the runtime libraries.
Prior art
NA
Unresolved questions
- Need to add more operation/pattern support in the future.
Future possibilities
Maybe auto-search technical can be applied? for example, if we have mulit-core device and multi-batch application.