This RFC outlines a high-level roadmap towards what we might consider a Standalone of µTVM (Micro TVM). In saying “standalone,” we are referring to a cohesive set of features that will enable a few end-user goals, one of them being standalone execution of optimized TVM models on-device.
In the coming weeks, I’ll be posting RFCs (as they’re written) for work that enables these goals. This RFC is meant to serve as context for those RFCs, as well as an overall place to discuss high-level goals of µTVM. I’m definitely interested in everyone’s thoughts on this overall direction for µTVM.
This roadmap aims to enable these potential end-user goals:
- Test simple models on supported hardware without writing any microcontroller code. Simple models means: models without conditional execution that wholly fit in the device flash and which can be evaluated without reusing RAM. A user should be able to view execution time (timed accurately on device), code size, memory consumption, and model output. It should be feasible to test performance under different SoC configurations, though this may involve writing microcontroller firmware or e.g. tweaking RTOS settings.
- Easily package a tested model (from #1) into a C library with the following properties:
- BYO memory allocator, plus a provided standard allocator with BYO buffer
- no malloc() calls (outside of calls to the internal memory allocator)
- graph-based runtime, but the graph can be fixed/compiled AOT
- can be configured to use the same supporting library functions as are used during autotuning/eval. specifically, this means that the same TVMBackend functions invoked during autotuning are also invoked in production.
- Easily autotune supported operators without having to write too much TVM code beyond the model definition. You shouldn’t have to understand how TVM works to try autotuning.
We think these projects are the right ones to pursue in order to enable these goals. More detail for each of these projects will be given in RFCs to follow this one.
µTVM On-Device RPC Server (PoC)
Description: Following the RPC modularization PR, we propose to port the TVM C Runtime to bare metal targets, and use the MinRPC server to implement a (limited) TVM RPC server on-device using any pipe-like transport (i.e. UART, Ethernet, USB, semihosting, etc). This just implements the C++ RPCEndpoint on device, not other features implemented behind PackedFuncs, such as LoadModule, GraphRuntime, etc.
Rationale: TVM currently encodes device-specific memory layouts in the repository. In addition to this, TVM also needs to somehow specify the SoC configuration (i.e. oscillator, caches, power modes, etc) to reliably reproduce results. In order to scale past a few devices, effectively use flash, and take advantage of platform efforts such as Zephyr, mBED, Mynewt, and others, TVM should adopt a more portable µTVM compilation/linking strategy.
µTVM CI in TVM.
Description: Write a CI test for the On-Device Runtime against x86 and potentially simulated bare-metal implementations (I.e. qemu or other device emulators). Run the CI as part of the TVM pre-submit. We don’t intend to include real hardware in the TVM pre-submit. Outside of the pre-submit, we’d like to encourage use of the CI test to validate implementations of the on-device runtime on real hardware.
Rationale: Some CI test is needed to protect against breakages in the CRT on bare metal. The CI should be executable by all TVM contributors, since it will be in the pre-submit. The same test as is used in the pre-submit should be sufficient to validate real hardware.
Enable AutoTVM using the on-device runtime.
Description: Modify the AutoTVM build process to create µTVM On-Device Runtime binaries and flash them as is appropriate for the platform they’re using.
Rationale: AutoTVM needs to evaluate performance in scenarios that exactly mimic real-world device configuration.
Place Model Weights in Flash.
Description: Modify C codegen to output supplied model weights as const arrays, possibly with a user-specified section.
Rationale: Allows for more realistic use of device memory and allows larger models to fit.
Graph Runtime on bare metal.
Description: Make the graph runtime or full-model execution work on bare metal with a firmware-friendly interface. Without this project, models still need to be driven end-to-end by a connected TVM “supervisor” instance containing the GraphRuntime. This change enables firmware engineers to integrate TVM models into production applications.
Rationale: Supports goal #2, and allows us a chance to ensure that the on-device runtime executes graphs in the same way both during AutoTVM and during production.
Export stats from the on-device runtime.
Description: Provide RPC calls for stats like execution time, memory usage.
Rationale: Supports goal #1. Allows firmware engineers to better evaluate TVM model output and collaborate with other engineers/data scientists involved with model development.
Proof of Concept
Parts of project #1 work to some degree here. The short-term plan is to split this PoC into a couple of pieces, each with its own RFC, and discuss/merge piece by piece.
This roadmap is just an initial concept and we’d definitely like to work with the community to make sure this direction is useful for others. We intend to drive some of this work from OctoML, but there are a lot of tasks and there are plenty of ways to get involved.
More immediately we’d love feedback on the overall direction. The On-Device RPC Server (project #1) underpins most of the rest of the work, so we’d welcome review on our initial implementation (RFCs and PRs to come soon). Once that lands in the CI, it should be much easier to collaborate on the rest of this effort.
We’ll also have a µTVM-focused meetup on Thursday 6/18 9am PDT if you’d like to discuss in a higher-bandwidth setting. We’ll post any followup points for discussion on the forum.