[RFC] [µTVM] Project API

areusch · March 18, 2021, 12:24am

See PoC. Commit message explains how to run PoC.
See previous embryonic RFC.
This RFC implements project #6 from µTVM M2 roadmap.

Background

In order to support autotuning, microTVM needs a way to drive the typical embedded development workflow:

Generate a firmware project containing TVM operator code
Build the firmware project
Program the firmware project onto a device
Drive operator execution on-device

The present flow is abstracted behind several different Python interface classes that live in python/tvm/micro:

tvm.micro.Compiler defines an interface for building libraries and firmware binaries. Implicit in this interface is project generation:
- library() requires implementations to generate an anonymous project with a firmware build tool and then build a C library from given source files
- binary() requires implementations to generate a firmware binary project with a firmware build tool and then build a final firmware image
tvm.micro.Flasher wraps a single flash() function which programs the device
tvm.micro.Transport abstracts the process of reading and writing data to the firmware device

The present abstraction suffers from several drawbacks that make it hard to work with:

It’s not very easy to break apart, so debugging is difficult.
1. The present flow always generates code in a set of temporary directories
2. TVM issues the library-level build commands, as opposed to a build system
Libraries and binaries are expected to be relocatable, but many firmware platforms don’t support this
- This was originally done to support AutoTVM
Implementations of these interfaces are restricted to TVM’s Python dependencies. Contributors adding new implementations may be forced to add new Python dependencies to TVM. This is both difficult to do and difficult to scale.

Proposal

Goals

This RFC proposes to introduce a new Project API with these goals:

G1. Allow TVM to drive builds of a variety of firmware platforms for the purpose of AutoTVM

G2. Allow API implementations to live in a Python virtualenv other than the one containing TVM

G3. Move implementations to their own git repositories

G4. Reorganize the abstraction around the four embedded workflows steps described above (generate, build, flash, execute)

G5. Allow for better debuggability/hackability, so that the flow is usable even when things haven’t been 100% automated in TVM

API Summary

The Compiler, Flasher, and (to some degree) Transport APIs will be replaced with this single project-level API. Each implementation of this API lives in a file microtvm_api_server.py, which lives at the top level of a firmware project.

# ProjectOption - describes a key-value option that can be present in `options`.
ProjectOption = collections.namedtuple(
    'ProjectOption', 
    ('name',  # `options` dict key used with this option.
     'help',  # Human-readable description of this option.
    ))

# ServerInfo - describes metadata about this server.
ServerInfo = collections.namedtuple(
    'ServerInfo', 
    ('protocol_version',  # An integer identifying the supported API revision.
     'platform_name',  # short name used to identify this server to TVM
     'is_template',  # True when the attached firmware project is used only to
                     # generate projects given a Model Library Format export
     'model_library_format_path',  # Path to a Model Library Format artifact,
                                   # present when is_template = False
     'project_options',  # List of ProjectOption. Acceptable values for the 
                         # `options` parameter below.
    ))

class ProjectAPIHandler(metaclass=abc.ABCMeta):

    @abc.abstractmethod
    def server_info_query(self) -> ServerInfo:
        raise NotImplementedError()

    @abc.abstractmethod
    def generate_project(self, model_library_format_path : str, standalone_crt_dir : str, project_dir : str, options : dict):
        """Generate a project from the given artifacts, copying ourselves to that project.

        Parameters
        ----------
        model_library_format_path : str
            Path to the Model Library Format tar archive.
        standalone_crt_dir : str
            Path to the root directory of the "standalone_crt" TVM build artifact. This contains the
            TVM C runtime.
        project_dir : str
            Path to a nonexistent directory which should be created and filled with the generated
            project.
        options : dict
            Dict mapping option name to ProjectOption.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def build(self, options : dict):
        """Build the project, enabling the flash() call to made.

        Parameters
        ----------
        options : Dict[str, ProjectOption]
            ProjectOption which may influence the build, keyed by option name.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def flash(self, options : dict):
        """Program the project onto the device.

        Parameters
        ----------
        options : Dict[str, ProjectOption]
            ProjectOption which may influence the programming process, keyed by option name.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def connect_transport(self, options : dict) -> TransportTimeouts:
        """Connect the transport layer, enabling write_transport and read_transport calls.

        Parameters
        ----------
        options : Dict[str, ProjectOption]
            ProjectOption which may influence the programming process, keyed by option name.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def disconnect_transport(self):
        """Disconnect the transport layer.

        If the transport is not connected, this method is a no-op.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def read_transport(self, n : int, timeout_sec : typing.Union[float, type(None)]) -> int:
        """Read data from the transport

        Parameters
        ----------
        n : int
            Maximum number of bytes to read from the transport.
        timeout_sec : Union[float, None]
            Number of seconds to wait for at least one byte to be written before timing out. The
            transport can wait additional time to account for transport latency or bandwidth
            limitations based on the selected configuration and number of bytes being received. If
            timeout_sec is 0, write should attempt to service the request in a non-blocking fashion.
            If timeout_sec is None, write should block until at least 1 byte of data can be
            returned.

        Returns
        -------
        bytes :
            Data read from the channel. Less than `n` bytes may be returned, but 0 bytes should
            never be returned. If returning less than `n` bytes, the full timeout_sec, plus any
            internally-added timeout, should be waited. If a timeout or transport error occurs,
            an exception should be raised rather than simply returning empty bytes.

        Raises
        ------
        TransportClosedError :
            When the transport layer determines that the transport can no longer send or receive
            data due to an underlying I/O problem (i.e. file descriptor closed, cable removed, etc).

        IoTimeoutError :
            When `timeout_sec` elapses without receiving any data.
        """
        raise NotImplementedError()

    @abc.abstractmethod
    def write_transport(self, data : bytes, timeout_sec : float) -> int:
        """Connect the transport layer, enabling write_transport and read_transport calls.

        Parameters
        ----------
        data : bytes
            The data to write over the channel.
        timeout_sec : Union[float, None]
            Number of seconds to wait for at least one byte to be written before timing out. The
            transport can wait additional time to account for transport latency or bandwidth
            limitations based on the selected configuration and number of bytes being received. If
            timeout_sec is 0, write should attempt to service the request in a non-blocking fashion.
            If timeout_sec is None, write should block until at least 1 byte of data can be
            returned.

        Returns
        -------
        int :
            The number of bytes written to the underlying channel. This can be less than the length
            of `data`, but cannot be 0 (raise an exception instead).

        Raises
        ------
        TransportClosedError :
            When the transport layer determines that the transport can no longer send or receive
            data due to an underlying I/O problem (i.e. file descriptor closed, cable removed, etc).

        IoTimeoutError :
            When `timeout_sec` elapses without receiving any data.
        """
        raise NotImplementedError()

Implementing the API

An implementation of this API initially lives in template firmware project. An example directory tree is shown below:

template-project/
  - microtvm_api_server.py - A runnable Python script containing the API impl.
  - launch_microtvm_api_server.sh - If present, ran instead of 
                                    microtvm_api_server.py to launch the API
                                    server.
  - src/  - Example of a platform-specific directory that may contain firmware
            main() glue used to run the µTVM RPC server and generated operators.
  - prj.conf - Example of platform-specific project configuration
  - CMakeLists.txt - Example of platform-specific build rules for this project

In this example, src, prj.conf and CMakeLists.txt are all specific to the platform used to build the firmware image. TVM doesn’t care whether these specific files exist—they are just referenced from microtvm_api_server.py. Implementing with e.g. a Makefile is entirely possible (and is done in the PoC) as is with any different directory structure.

After generate_project is called, the implementation should produce a new directory tree specific to the model in use. Any templates in the template-project should be expanded. Also, the server should copy itself to the generated project, so it may be used standalone from the template. An example directory tree is shown below:

generated-project/
  - microtvm_api_server.py - A runnable Python script containing the API impl.
  - launch_microtvm_api_server.sh - If present, ran instead of 
                                    microtvm_api_server.py to launch the API
                                    server.
  - model.tar - Model Library Format containing the model used with this project.
  - src/  - Example of a platform-specific directory that may contain firmware
            main() glue used to run the µTVM RPC server and generated operators.
  - prj.conf - Example of platform-specific project configuration
  - CMakeLists.txt - Example of platform-specific build rules for this project

The RPC System

In order to meet goal G2, an RPC system is needed to allow the server to live in a subprocess of TVM. I considered two RPC systems for this use, and happy to consider others if there are suggestions:

R1. gRPC

R2. JSON-RPC

When evaluating RPC systems, I used these criteria:

C1. Availability of implementations. The RPC system should have a published spec and several implementations in both Python and other languages.

C2. Complexity and additional dependencies. The RPC system should not impose lots of extra Python dependencies on TVM since it is an optional component. Additionally, it will process 1 API call at a time, and doesn’t need to be web-scale.

C3. User-friendliness. The RPC layer should be easy to debug and understand.

gRPC has these advantages:

Many high-quality implementations across a wide set of languages
Broad development community
Excellent documentation

and these disadvantages:

Difficult to debug being a binary protocol
More complexity than is needed for this application
Requires the use of TCP sockets in Python, adding to the complexity
Uses protobufs, which are polarizing
Adds several Python dependencies to TVM

JSON-RPC has these advantages:

Many implementations available across a wide set of languages
Simple specification
Readable wire traffic
Simple to use over file descriptors, removing the need for TCP sockets
Easy enough to implement that 0 additional dependencies are required of TVM

and these disadvantages

Since it is a JSON protocol, binary data must be wrapped in e.g. base85
Verbose which could pose performance problems
Less popular than gRPC (I think)

Given that the API wraps a low-traffic build system, and the binary data to be transported is going to be considered small for systems implementing the API, the lack of additional dependencies, clear wire format, and usability with file descriptors makes JSON-RPC a win in my book. Happy to debate.

Launching the RPC Server

When launch_microtvm_api_server.sh is present, TVM invokes it as a subprocess to launch the RPC server. When only microtvm_api_server.py is present, TVM invokes it using the same Python interpreter it is running under.

TVM expects these scripts to take a standard set of command-line arguments:

usage: microtvm_api_server.py [-h] --read-fd READ_FD --write-fd WRITE_FD
                              [--debug]

Generic TVM Project API server entry point

optional arguments:
  -h, --help           show this help message and exit
  --read-fd READ_FD    Numeric file descriptor where RPC requests should be
                       read.
  --write-fd WRITE_FD  Numeric file descriptor where RPC replies should be
                       written.
  --debug              When given, configure logging at DEBUG level

After launching the subprocess, TVM issues RPC commands via --read-fd and reads replies via --write-fd.

Flows

This section explores some flows that could use the microTVM API server.

Host-Driven Inference

A Relay IRModule is created containing the model.
The Relay IRModule and parameters are compiled.
A Model Library Format export is created.
The user specifies a standalone template project template-project
TVM launches the API server for the template project
TVM invokes generate_project passing the Model Library Format export into the demo project template-project
TVM shuts down the API server
TVM launches the API server in generated-project
TVM invokes build and flash to build and program the device
TVM creates a new RPC session to the device using connect_transport , thus making the Project API server the transport.
TVM instantiates the GraphRuntime and uses the session.

AutoTVM-with-microTVM

Here is a sketch of how AutoTVM will leverage this API:

In the AutoTVM builder: Build a Task and produce a Model Library Format export of this task.
In the AutoTVM runner:
1. Launch the API server in the template project template-project
2. Invoke generate_project passing the CRT, Model Library Format, and a new temporary generated project directory generated-project
3. Shutdown the API server
4. Launch the API server in the generated project
5. Invoke build and flash to build firmware and program the device
6. Invoke connect_transport and create a new TVM RPC session using the API server as a transport
7. Use µTVM RPC server to time the implemented Task.
8. Invoke disconnect_transport to close the session
9. Delete generated-project

Standalone Demo Project Generator

Though not an explicit goal of this work, standalone demo projects showing inference on-device could also be generated. The flow would be as follows:

A Relay IRModule is created containing the model.
The Relay IRModule and parameters are compiled.
A Model Library Format export is created.
The user specifies a standalone template project template-project
TVM launches the API server for the template project
TVM invokes generate_project passing the Model Library Format export into the demo project template-project
TVM shuts down the API server
The user builds and flashes the project using the platform build system

Debugging bad operator implementations

Sometimes, AutoTVM produces a bad implementation of an operator or tries to use too much on-device memory. Other projects are actively working to improve error reporting, but for cases when interactive debugging is necessary, this API server supports doing so:

Run AutoTVM-with-microTVM flow through step 2.5. Ensure the project is generated to a non-temporary directory.
Use the platform to launch the debugger and attach to the device.
Resume AutoTVM execution and observer execution with an attached debugger.

Testing API server implementations

The standard test of an API server would be the Host-Driven Inference flow above. A standard test suite can be defined in TVM and then each implementation can use that as validation.

Future Directions

There are a couple of future directions this approach could be taken:

Runtime selection - currently we only support GraphRuntime with the C runtime, but as new runtimes e.g. AOT become available, it may be necessary to choose one.
tvmc integration - the APIs here are intended to eventually be exposed as tvmc commands. Future RFCs will address this.

For Discussion

Some topics for discussion:

T1. Does this approach seem scalable and better than what we have now?

T2. Are there other RPC systems we should consider?

T3. Are there other concerns with this approach e.g. debuggability?

@manupa-arm @leo-arm @ramana-arm @tgall_foo @gromero @aca88 @mdw-octoml @mehrdadh @tqchen @jroesch

mdw-octoml · March 22, 2021, 6:01pm

In general this sounds good to me. I don’t understand the need for both the Python and shell-based entry points. Is there a need to ever have anything other than the “standard” implementation (microtvm_api_server.py) present? It just seems like unnecessary complexity to have to support multiple entry points when one can just as well use microtvm_api_server.py to launch whatever processes are needed to implement the project API for a given hardware target. Hope this makes sense.

Recommend clarifying “RPC server” in the above which is a heavily overloaded term. Perhaps “MicroTVM API server” would be a better term to settle on.

Managing the lifetime of the microTVM API Server seems important. Is the idea that there is only one of these running at a time? Is it long-lived or only used during short sessions where a single binary is being uploaded/measured? The way the flows are written here it is not clear how often this would be brought up or shut down, whether you’d have multiple running concurrently, etc. Some clarity on how TVM will manage this may be useful.

Leo-arm · March 24, 2021, 2:01pm

This is more flexible than the current implementation so this is a good step. I would be in favour of not making many assumptions on the functionality of the API, and merely pass in the generated artefacts and runtime directory, options, and leave it to the implementor of the server to take the necessary steps to build/flash or whatever it takes, or even just copy artefacts into the right locations and run the build separately if they wish. I have no opinion on RPC. Given that the whole process is separate I see no immediate issues with debug ability or other toolchain related issues.

areusch · March 26, 2021, 5:28pm

One open question is where well-known implementations of Project API servers should live. I see a couple of options:

O1. Place them in the tvm repo, probably under apps/microtvm as is roughly done in PR 7653 (Project API not implemented there, but the code is moved where it would go).

O2. Create separate repositories for implementations.

I see a couple of benefits to putting implementations in the TVM repo:

It’s easier to discover them
It’s easier to keep them updated in case we make purposeful or accidental breaking changes
Documentation can be consolidated with the TVM docs

But I see these drawbacks:

We need to include unit/integration tests in the TVM CI. It may not always be possible to do this e.g. if a hardware-in-the-loop solution is required.
The TVM docs are really hard to update (we are working on this…but that’s the state of the world today).

It seems like essentially each implementation could choose.

If they are able to provide sufficient test coverage in pure software (e.g. with QEMU or another emulator), I think we could definitely allow implementations to be checked-in to apps/microtvm.
We could consider a second tier for e.g. those that can’t provide CI coverage: apps/microtvm/no-ci
Finally, if the TVM development process is too burdensome, implementations could go in other e.g. GH repos, provide a README.md, and be linked from microTVM docs page.
- I don’t think we can offer CI coverage for these–it’s too hard to report results without e.g. a pinned submodule in TVM
- I don’t see why someone would maintain a TVM submodule rather than check in code directly (I could see a use case where these implementations have a stable and dev branch, but this seems a bit forward-looking right now).

Thoughts? I’m inclined to proceed with this framework if no one objects.