Hi,
with this, I wanted to share our tool and gather some feedback. It is about the deployment of TinyML models on tiny microcontrollers.
Motivation
For tiny microcontrollers, we did not find a backend for TVM that provides a suitable deployment method. The closest candidate we found would be the standalone C runtime, but this one still spends a lot of computation time and code size on JSON parsing and requires dynamic memory allocation.
As far as I understand the long-term goal to solve this issue is to develop an “Ahead-of-time compiler” that, at its core, does basically the same as my tool, but it shall be more integrated into the whole TVM flow. However, until it is ready, we believe our solution can work as a substitute.
Description
The tool requires the outputs from the relay.build
command (graph.json, params.bin, kernels.c) and generates a C source file that is able to statically execute the model without any additional TVM runtime. The final deployable code will therefore just be the optimized kernels.c from TVM, the generated calling code that executes these kernels in the correct order, and some top-level code to use the model. This makes the deployment very efficient in terms of computation time and memory usage.
Results
The complete flow was deployed and simulated with a few models on RISC-V “rv32gc” with our simulator ETISS
Variants:
- TFLMCompiler: Our “TensorFlow Lite for Microcontrollers” flow with a similar approach for code generation of static inference code that avoids the TFLM interpreter (Link limit reached, it is on GitHub: cpetig/tflite_micro_compiler).
- µTVM: The default µTVM flow, with some added automation for the deployment, see
examples/codegen.py
. - TVMCodeGen: This tool.
We can see that TVM produces much better kernels than TFLite Micro, but for very small models, the runtime overheads hurts quite a bit. With our code generator, the overhead is eliminated and we get the best numbers that we could produce so far.