- See PoC. Commit message explains how to run PoC.
- See previous embryonic RFC.
- This RFC implements project #6 from µTVM M2 roadmap.
Background
In order to support autotuning, microTVM needs a way to drive the typical embedded development workflow:
- Generate a firmware project containing TVM operator code
- Build the firmware project
- Program the firmware project onto a device
- Drive operator execution on-device
The present flow is abstracted behind several different Python interface classes that live in python/tvm/micro
:
-
tvm.micro.Compiler
defines an interface for building libraries and firmware binaries. Implicit in this interface is project generation:-
library()
requires implementations to generate an anonymous project with a firmware build tool and then build a C library from given source files -
binary()
requires implementations to generate a firmware binary project with a firmware build tool and then build a final firmware image
-
-
tvm.micro.Flasher
wraps a singleflash()
function which programs the device -
tvm.micro.Transport
abstracts the process of reading and writing data to the firmware device
The present abstraction suffers from several drawbacks that make it hard to work with:
- It’s not very easy to break apart, so debugging is difficult.
- The present flow always generates code in a set of temporary directories
- TVM issues the library-level build commands, as opposed to a build system
- Libraries and binaries are expected to be relocatable, but many firmware platforms don’t support this
- This was originally done to support AutoTVM
- Implementations of these interfaces are restricted to TVM’s Python dependencies. Contributors adding new implementations may be forced to add new Python dependencies to TVM. This is both difficult to do and difficult to scale.
Proposal
Goals
This RFC proposes to introduce a new Project API with these goals:
G1. Allow TVM to drive builds of a variety of firmware platforms for the purpose of AutoTVM
G2. Allow API implementations to live in a Python virtualenv other than the one containing TVM
G3. Move implementations to their own git repositories
G4. Reorganize the abstraction around the four embedded workflows steps described above (generate, build, flash, execute)
G5. Allow for better debuggability/hackability, so that the flow is usable even when things haven’t been 100% automated in TVM
API Summary
The Compiler
, Flasher
, and (to some degree) Transport
APIs will be replaced with this single project-level API. Each implementation of this API lives in a file microtvm_api_server.py
, which lives at the top level of a firmware project.
# ProjectOption - describes a key-value option that can be present in `options`.
ProjectOption = collections.namedtuple(
'ProjectOption',
('name', # `options` dict key used with this option.
'help', # Human-readable description of this option.
))
# ServerInfo - describes metadata about this server.
ServerInfo = collections.namedtuple(
'ServerInfo',
('protocol_version', # An integer identifying the supported API revision.
'platform_name', # short name used to identify this server to TVM
'is_template', # True when the attached firmware project is used only to
# generate projects given a Model Library Format export
'model_library_format_path', # Path to a Model Library Format artifact,
# present when is_template = False
'project_options', # List of ProjectOption. Acceptable values for the
# `options` parameter below.
))
class ProjectAPIHandler(metaclass=abc.ABCMeta):
@abc.abstractmethod
def server_info_query(self) -> ServerInfo:
raise NotImplementedError()
@abc.abstractmethod
def generate_project(self, model_library_format_path : str, standalone_crt_dir : str, project_dir : str, options : dict):
"""Generate a project from the given artifacts, copying ourselves to that project.
Parameters
----------
model_library_format_path : str
Path to the Model Library Format tar archive.
standalone_crt_dir : str
Path to the root directory of the "standalone_crt" TVM build artifact. This contains the
TVM C runtime.
project_dir : str
Path to a nonexistent directory which should be created and filled with the generated
project.
options : dict
Dict mapping option name to ProjectOption.
"""
raise NotImplementedError()
@abc.abstractmethod
def build(self, options : dict):
"""Build the project, enabling the flash() call to made.
Parameters
----------
options : Dict[str, ProjectOption]
ProjectOption which may influence the build, keyed by option name.
"""
raise NotImplementedError()
@abc.abstractmethod
def flash(self, options : dict):
"""Program the project onto the device.
Parameters
----------
options : Dict[str, ProjectOption]
ProjectOption which may influence the programming process, keyed by option name.
"""
raise NotImplementedError()
@abc.abstractmethod
def connect_transport(self, options : dict) -> TransportTimeouts:
"""Connect the transport layer, enabling write_transport and read_transport calls.
Parameters
----------
options : Dict[str, ProjectOption]
ProjectOption which may influence the programming process, keyed by option name.
"""
raise NotImplementedError()
@abc.abstractmethod
def disconnect_transport(self):
"""Disconnect the transport layer.
If the transport is not connected, this method is a no-op.
"""
raise NotImplementedError()
@abc.abstractmethod
def read_transport(self, n : int, timeout_sec : typing.Union[float, type(None)]) -> int:
"""Read data from the transport
Parameters
----------
n : int
Maximum number of bytes to read from the transport.
timeout_sec : Union[float, None]
Number of seconds to wait for at least one byte to be written before timing out. The
transport can wait additional time to account for transport latency or bandwidth
limitations based on the selected configuration and number of bytes being received. If
timeout_sec is 0, write should attempt to service the request in a non-blocking fashion.
If timeout_sec is None, write should block until at least 1 byte of data can be
returned.
Returns
-------
bytes :
Data read from the channel. Less than `n` bytes may be returned, but 0 bytes should
never be returned. If returning less than `n` bytes, the full timeout_sec, plus any
internally-added timeout, should be waited. If a timeout or transport error occurs,
an exception should be raised rather than simply returning empty bytes.
Raises
------
TransportClosedError :
When the transport layer determines that the transport can no longer send or receive
data due to an underlying I/O problem (i.e. file descriptor closed, cable removed, etc).
IoTimeoutError :
When `timeout_sec` elapses without receiving any data.
"""
raise NotImplementedError()
@abc.abstractmethod
def write_transport(self, data : bytes, timeout_sec : float) -> int:
"""Connect the transport layer, enabling write_transport and read_transport calls.
Parameters
----------
data : bytes
The data to write over the channel.
timeout_sec : Union[float, None]
Number of seconds to wait for at least one byte to be written before timing out. The
transport can wait additional time to account for transport latency or bandwidth
limitations based on the selected configuration and number of bytes being received. If
timeout_sec is 0, write should attempt to service the request in a non-blocking fashion.
If timeout_sec is None, write should block until at least 1 byte of data can be
returned.
Returns
-------
int :
The number of bytes written to the underlying channel. This can be less than the length
of `data`, but cannot be 0 (raise an exception instead).
Raises
------
TransportClosedError :
When the transport layer determines that the transport can no longer send or receive
data due to an underlying I/O problem (i.e. file descriptor closed, cable removed, etc).
IoTimeoutError :
When `timeout_sec` elapses without receiving any data.
"""
raise NotImplementedError()
Implementing the API
An implementation of this API initially lives in template firmware project. An example directory tree is shown below:
template-project/
- microtvm_api_server.py - A runnable Python script containing the API impl.
- launch_microtvm_api_server.sh - If present, ran instead of
microtvm_api_server.py to launch the API
server.
- src/ - Example of a platform-specific directory that may contain firmware
main() glue used to run the µTVM RPC server and generated operators.
- prj.conf - Example of platform-specific project configuration
- CMakeLists.txt - Example of platform-specific build rules for this project
In this example, src
, prj.conf
and CMakeLists.txt
are all specific to the platform used to build the firmware image. TVM doesn’t care whether these specific files exist—they are just referenced from microtvm_api_server.py
. Implementing with e.g. a Makefile
is entirely possible (and is done in the PoC) as is with any different directory structure.
After generate_project
is called, the implementation should produce a new directory tree specific to the model in use. Any templates in the template-project
should be expanded. Also, the server should copy itself to the generated project, so it may be used standalone from the template. An example directory tree is shown below:
generated-project/
- microtvm_api_server.py - A runnable Python script containing the API impl.
- launch_microtvm_api_server.sh - If present, ran instead of
microtvm_api_server.py to launch the API
server.
- model.tar - Model Library Format containing the model used with this project.
- src/ - Example of a platform-specific directory that may contain firmware
main() glue used to run the µTVM RPC server and generated operators.
- prj.conf - Example of platform-specific project configuration
- CMakeLists.txt - Example of platform-specific build rules for this project
The RPC System
In order to meet goal G2, an RPC system is needed to allow the server to live in a subprocess of TVM. I considered two RPC systems for this use, and happy to consider others if there are suggestions:
R1. gRPC
R2. JSON-RPC
When evaluating RPC systems, I used these criteria:
C1. Availability of implementations. The RPC system should have a published spec and several implementations in both Python and other languages.
C2. Complexity and additional dependencies. The RPC system should not impose lots of extra Python dependencies on TVM since it is an optional component. Additionally, it will process 1 API call at a time, and doesn’t need to be web-scale.
C3. User-friendliness. The RPC layer should be easy to debug and understand.
gRPC has these advantages:
- Many high-quality implementations across a wide set of languages
- Broad development community
- Excellent documentation
and these disadvantages:
- Difficult to debug being a binary protocol
- More complexity than is needed for this application
- Requires the use of TCP sockets in Python, adding to the complexity
- Uses protobufs, which are polarizing
- Adds several Python dependencies to TVM
JSON-RPC has these advantages:
- Many implementations available across a wide set of languages
- Simple specification
- Readable wire traffic
- Simple to use over file descriptors, removing the need for TCP sockets
- Easy enough to implement that 0 additional dependencies are required of TVM
and these disadvantages
- Since it is a JSON protocol, binary data must be wrapped in e.g. base85
- Verbose which could pose performance problems
- Less popular than gRPC (I think)
Given that the API wraps a low-traffic build system, and the binary data to be transported is going to be considered small for systems implementing the API, the lack of additional dependencies, clear wire format, and usability with file descriptors makes JSON-RPC a win in my book. Happy to debate.
Launching the RPC Server
When launch_microtvm_api_server.sh
is present, TVM invokes it as a subprocess to launch the RPC server. When only microtvm_api_server.py
is present, TVM invokes it using the same Python interpreter it is running under.
TVM expects these scripts to take a standard set of command-line arguments:
usage: microtvm_api_server.py [-h] --read-fd READ_FD --write-fd WRITE_FD
[--debug]
Generic TVM Project API server entry point
optional arguments:
-h, --help show this help message and exit
--read-fd READ_FD Numeric file descriptor where RPC requests should be
read.
--write-fd WRITE_FD Numeric file descriptor where RPC replies should be
written.
--debug When given, configure logging at DEBUG level
After launching the subprocess, TVM issues RPC commands via --read-fd
and reads replies via --write-fd
.
Flows
This section explores some flows that could use the microTVM API server.
Host-Driven Inference
- A Relay IRModule is created containing the model.
- The Relay IRModule and parameters are compiled.
- A Model Library Format export is created.
- The user specifies a standalone template project
template-project
- TVM launches the API server for the template project
- TVM invokes
generate_project
passing the Model Library Format export into the demo projecttemplate-project
- TVM shuts down the API server
- TVM launches the API server in
generated-project
- TVM invokes
build
andflash
to build and program the device - TVM creates a new RPC session to the device using
connect_transport
, thus making the Project API server the transport. - TVM instantiates the
GraphRuntime
and uses the session.
AutoTVM-with-microTVM
Here is a sketch of how AutoTVM will leverage this API:
- In the AutoTVM builder: Build a Task and produce a Model Library Format export of this task.
- In the AutoTVM runner:
- Launch the API server in the template project
template-project
- Invoke
generate_project
passing the CRT, Model Library Format, and a new temporary generated project directorygenerated-project
- Shutdown the API server
- Launch the API server in the generated project
- Invoke
build
andflash
to build firmware and program the device - Invoke
connect_transport
and create a new TVM RPC session using the API server as a transport - Use µTVM RPC server to time the implemented Task.
- Invoke
disconnect_transport
to close the session - Delete
generated-project
- Launch the API server in the template project
Standalone Demo Project Generator
Though not an explicit goal of this work, standalone demo projects showing inference on-device could also be generated. The flow would be as follows:
- A Relay IRModule is created containing the model.
- The Relay IRModule and parameters are compiled.
- A Model Library Format export is created.
- The user specifies a standalone template project
template-project
- TVM launches the API server for the template project
- TVM invokes
generate_project
passing the Model Library Format export into the demo projecttemplate-project
- TVM shuts down the API server
- The user builds and flashes the project using the platform build system
Debugging bad operator implementations
Sometimes, AutoTVM produces a bad implementation of an operator or tries to use too much on-device memory. Other projects are actively working to improve error reporting, but for cases when interactive debugging is necessary, this API server supports doing so:
- Run AutoTVM-with-microTVM flow through step 2.5. Ensure the project is generated to a non-temporary directory.
- Use the platform to launch the debugger and attach to the device.
- Resume AutoTVM execution and observer execution with an attached debugger.
Testing API server implementations
The standard test of an API server would be the Host-Driven Inference flow above. A standard test suite can be defined in TVM and then each implementation can use that as validation.
Future Directions
There are a couple of future directions this approach could be taken:
- Runtime selection - currently we only support GraphRuntime with the C runtime, but as new runtimes e.g. AOT become available, it may be necessary to choose one.
-
tvmc
integration - the APIs here are intended to eventually be exposed astvmc
commands. Future RFCs will address this.
For Discussion
Some topics for discussion:
T1. Does this approach seem scalable and better than what we have now?
T2. Are there other RPC systems we should consider?
T3. Are there other concerns with this approach e.g. debuggability?
@manupa-arm @leo-arm @ramana-arm @tgall_foo @gromero @aca88 @mdw-octoml @mehrdadh @tqchen @jroesch