[µTVM] Mini-Map: the µTVM Developer Workflow

areusch · February 3, 2021, 6:20pm

This Mini-(road)Map is a high-level design proposal that describes how µTVM M2 projects 1 (Library Generator), 6 (Project-level API), and 7 (tvmc integration) come together to form a new µTVM development workflow.

This doc isn’t a formal RFC to any one of these projects—before implementing each, a separate RFC should be submitted. But, small sketches of RFCs are included for each project at the end of the doc. Enough questions and interest in these topics has come up recently that I thought this was worth discussing in some detail in case others want to start working on these projects before I have cycles.

Background

Right now, all µTVM workflows exist entirely in Python scripts, and it’s challenging to integrate with new RTOS and microcontrollers. TVM drives the entire build process, so there is a fairly tight integration between TVM and the build system for any embedded RTOS in the picture. As µTVM matures and tvmc develops into a proper TVM command-line driver, changes are needed to make a µTVM workflow that is accessible to developers without intimate knowledge of TVM’s APIs.

µTVM Workflows

This section describes the µTVM workflow today and proposes a more developer-friendly workflow that M2 projects 1, 6, and 7 should work towards.

The µTVM Workflow Today

µTVM today supports essentially one workflow: compiling and running models on-device. This workflow is demonstrated micro_tflite tutorial, which breaks up the process into 4 pieces (with relevant TVM APIs and their return values listed):

Model Library Generation — tvm.relay.build() -> CSourceModule
Firmware compilation — tvm.micro.build_static_runtime() -> MicroBinary
Device programming — tvm.micro.Compiler.flash() -> Transport
Model execution — tvm.micro.Session() / tvm.runtime.GraphRuntime() -> NDArray

Though micro_tflite purports to be a user-facing inference tutorial, its workflow was actually designed around AutoTVM, in which hundreds to thousands of firmware images are compiled and tested on a fleet of devices. That workflow doesn’t necessarily map to a semi-automated developer workflow as would be expected from a command-line tool such as tvmc.

In particular, though there are ways to execute any one of these pieces standalone, there are some significant limitations to the return values of each process step that make it difficult to pause and resume the process. Certainly, at each step of the way, there is additional state in micro_tflite.py that isn’t captured in these return values, and which is necessary for later steps. To name a few examples:

Since PR 7002 and after PR 7398, CSourceModule is actually a collection of C files, and the de-facto way to save to disk produces a tar archive. However, this tar archive doesn’t include any downstream compiler configuration such as CFLAGS or libraries which may be needed by the operator implementation chosen by TVM, nor does it include the C runtime common libraries that those operators depend on.
The MicroBinary artifact can be saved to and loaded from disk, but many RTOS expect that the build artifacts be left in place on disk for further operations such as flashing and debugging.
The Transport object returned from Device Programming is live and can’t be tied back to a particular development board. A MicroBinary includes no information that describes the targeted board.

The `tvmc` Workflow

I consider the pieces of the workflow described above to roughly correspond to steps in the average developer workflow (but happy to debate). In moving to tvmc then, we mainly need to address these pause/resume challenges and design a tool that is usable by developers without requiring extensive knowledge of TVM internals. At the center of this challenge are two things:

The way we save state to disk between each step in the workflow
The line between TVM and the firmware project that is compiled and flashed onto the device.

A hallmark of bare metal programming is that extreme complexity can be hidden amongst several innocuous lines of code, and even the order in which those operations or compiled can make a difference between functional and broken firmware. Therefore, this RFC proposes that tvmc should stay out of the firmware project, aside perhaps from an initial Project Generation step (described more below). Any other automation on a project should be performed by that project’s build tool, and TVM should not expect to move the firmware project nor any build artifacts once they are created on disk.

A revised workflow based on this concept is below:

Model Library or Project Generation. TVM translates a Relay model into an artifact suitable to be included in a firmware project. See Model Library Format below for a strawman proposal of this format.

The user then has two choices, and their choice defines the output of this step:
1. Manually integrate these pieces into a downstream firmware project. The output is described in Model Library Format below.
2. Run a script to generate a demo project. Internally, a Model Library Format .tar or dirtree is generated, and the ultimate output is the generated project’s directory tree
Firmware compilation. The project’s build tool handles this. TVM can invoke the build tool in automated scenarios such as AutoTVM (see below).
Device programming. The project’s build tool handles this. TVM can invoke the build tool in automated scenarios such as AutoTVM (see below).
Model execution (assumes host-driven). The device is reset and TVM connects to the RPC server over a specified or autodetected UART (or Ethernet or USB peripheral etc). The Graph Runtime is instantiated according to the runtime configuration stored in the generated project or library .tar. Simplified Parameters are loaded from the generated project or library .tar.

Sketch of Project RFCs

Here I briefly sketch some of the important parts of the M2 projects that enable this workflow change. These are not RFCs in their own right, but they describe some important points that each project should contribute towards this workflow vision.

Model Library Format

Whatever type of artifact we are ultimately generating in tvmc workflow step 1 (Model Library or Project), a necessary step is storing the tvm.relay.build() output on disk in a standardized format. The point here is to provide familiarity to users above the Python API level and make it possible to automate project generation.

This format should at least keep the same on-disk organization across all configuration options (e.g. c vs llvm backend, -link-params, use of BYOC, graph vs AOT runtime, memory planner, etc). The produced output could be a .tar or a directory tree.

One possible organization is shown below:

metadata.json - Overall metadata describing this build
- TVM target
- Description/hash of original model
- Original parameters?
- Other state needed from model compilation later in the pipeline
crt/ - The content of standalone_crt from TVM build/
lib/ - Stores generated binary libraries
parameters.json - JSON-serialized Simplified Parameters (or this could be binary format)
README.md - Perhaps a short standardized README for new users
runtime-config/ - Stores runtime configuration.
- For GraphRuntime, graph.json should be created in this directory.
src/ - Stores generated C source

The Project API

The project workflow above can be divided into TVM-standard pieces and project-specific pieces. At present, Zephyr-specific logic is checked-in to the TVM codebase. However, TVM is a complex compiler with many targets, and the CI length can make it a daunting project to contribute to. To facilitate faster collaboration, this RFC proposes that the project-specific pieces be moved into a separate git repository and invoked through a Project-level API. Specifically, in the case of the Zephyr integration, this would be:

python/tvm/micro/contrib/zephyr.py - Zephyr Compiler and Flasher implementations
tests/micro/qemu/zephyr-runtime - Embryonic template project
Additional logic to implement the Project API

The full details of this process are left to a future Project-level API RFC, but some are in this Embryonic RFC and a sketch is here:

To start with, a user obtains these pieces:
1. Model and inputs to compile
2. TVM repo
3. A µTVM Platform Provider — a platform-specific git repo that contains the Project-level API implementation plus any templates needed to generate projects.
A python (or other language) script lives in the root of the µTVM Platform Provider as e.g. microtvm.py. TVM executes this script to begin interacting with the API.
Commands are written to stdin as one-line JSON requests. The script is expected to parse them and write a reply as one-line JSON
TVM can issue these API commands:
- GenerateProject(path/to/library.tar, project_config, project_dir) - Generate a new project in project_dir using this particular RTOS and project config. The script should copy itself into project_dir so it can be re-invoked there to issue further commands.
In the generated project_dir, TVM then re-invokes the script to do further operations:
- Build() - build firmware binary for this project
- Flash(serial_number)- flash built firmware binary for this project to device
- Transport() - open a transport channel that connects to the on-device RPC server

`tvmc` Integration

This one is largely specified by the workflow given above. Each workflow step is expected to roughly correspond to one tvmc subcommand. I’ll note a couple of things here:

In moving the RTOS-specific code out of the TVM repo, the Model Library or Project Generation step is expected to take an additional config option—the path to the µTVM Platform Provider repo
Each step will probably need platform-specific configuration, and it’s not clear whether there should be a Provider API tvmc could use to interrogate additional config options from the Provider Repo or whether some generic key-value thing is sufficient
This proposal leaves a debug command out of TVM. We don’t have to do that, but the hope is that this workflow allows the user to launch their own debug tools rather than relying on microtvm_debug_shell as we do today. It’s expected that whatever tvmc command implements workflow step 4 would be used to drive on-device execution for debug purposes.

Discussion Topics

Some starter topics:

T1. Does this workflow make sense to you? Are there additional use cases or alternate workflow proposals we should support?

T2. Are there concerns with moving platform-specific logic into other git repos?

T3. Does this seem like it will be easier to use or overly complex?

Remember, specifics such as “this seems like it should not be JSON” are more appropriate for each project’s RFC. If they seem particularly important, we could discuss those here.

Finally, I won’t have bandwidth to work on this for a month or so. Community contributions (please send an RFC/PoC if working on a project) would be super-welcome here.

@tqchen @thierry @mdw-octoml @ramana-arm @manupa-arm @leandron @tgall_foo @gromero @liangfu

leandron · February 3, 2021, 6:40pm

Thanks for writing this. Will have a look.

Cc @mjs

manupa-arm · February 3, 2021, 6:47pm

Hi @areusch ,

I think this is great and overall it make sense. One immediate thing that come to my mind, would we want to include a static archive (.a of all sources compiled and archived together) additionally as well in the Model Library – something similiar to what’s inside MicroLibrary. (Since/If we are mentioning the -mcpu in the target)

I ll do a thorough pass when I get some time.

max1996 · February 5, 2021, 7:54am

Hi @areusch,

one initial suggestion from my side would be another format for the simplified parameters: Google’s flatbuffer format might be a good alternative here. Google is using it in TFLite and provides interfaces for a lot of different programming languages. However I am not sure, if they offer a C API.

Google Flatbuffer Doc

areusch · February 5, 2021, 6:49pm

Makes sense–a binary format would be more compact. I think we’d need to survey how we use it–on the Python side, there would be concern on adding another dependency, and on the C side there’d be a concern whether the library size is worth requiring. One thing to note is that we aren’t actually parsing the parameter JSON on the C side–there is a separate mechanism (--link-params) to link parameters into the binary with minimal overhead.

mdw-octoml · February 6, 2021, 12:14am

Thanks @areusch , this looks like a great proposal.

My main question has to do with how the API is defined between TVM and the µTVM Platform Provider. You are proposing invoking shell commands with a JSON interface, as I understand it. I wonder if instead this should not be done as an RPC interface, which would potentially allow the platform provider to be hosted on a different machine / cloud instance than the TVM compiler itself. (I am not proposing TVM RPC in particular; I think REST or gRPC would be better.) The point being, that there is a daemon process responsible for the Platform Provider, and just as TVM today understands the concept of a remote runner for the sake of measuring models on devices, there can be a remote platform provider (which might be hosted on the same machine).

I tend to prefer RPC interfaces as they make the data sharing clear: all data exchange happens via the RPC interface, not via scribbling on the filesystem (or sending process signals, etc.) WDYT?

areusch · February 6, 2021, 1:18am

thanks for looking this over!

I think you bring up a great point and while the specifics might be better left to the Project-level API RFC, it is worth considering one thing in light of the Model Library Format RFC:

→ if Model Library Format can be archived into a tar/zip, it could be more easily distributed (vs being represented as a dirtree). this would then enable the RPC mechanism, should we decide to go that route.

i’m not sure I care too much between the API and the daemon/JSON methods. I’m not sure this would be the right sort of thing to serve as an internal service, though it’s very close to something that could be served. I think there’s a good chance the daemon and TVM will live on the same machine, but I can see how a remote daemon could work. I think the biggest consideration here is what Python packages both solutions depend on.

mdw-octoml · February 10, 2021, 4:55pm

Well, I think there are two basic approaches.

(1) The interface between TVM and the Platform Provider is just a Python API. Whether or not there happen to be separate processes/commands/whatever is hidden behind that API. So a Platform Provider simply implements the Platform Provider Python API invoked by TVM. This seems to be the most flexible. Maybe this is what you had in mind originally?

(2) If we want to standardize the “wire protocol” between the Platform Provider and TVM, I’d recommend just using gRPC or something similar, since it is well supported in Python, easy to integrate, and does not limit you in terms of where the client/server are running (they can be on the same machine or even in the same process). The main reason to do this would be to make Platform Providers pluggable at the wire protocol level rather than the Python API level. I am not sure whether this is important or not.

areusch · February 10, 2021, 6:29pm

At a high level I was kind of thinking to implement a mix of (1) and (2):

Checkin a Python library to the TVM repo which makes this look like a Python API to TVM and API implementation, and handles all the serialization/deserialization using JSON
Project API impls import this library and use it to communicate over stdio locally
Handle network communication using the existing RPC server

I kind of like this because there are no dependencies, and it’s not clear we need to expose this directly over the network–see below.

A use case I’d like to enable is compiling on the AutoTVM runner node. That requires some type of file upload, and the TVM RPC protocol already handles that. So my initial thought was just to use the Platform API as a sort of IPC (slightly more separation than a Python API–so you don’t have to use Python in your platform API implementation but it’s strongly suggested. I could see arguments for not-Python if the project’s build system is e.g. written heavily in ruby for some reason.

These are all just high-level initial thoughts. I need to think about them a bit more before I formally propose them.