Hi all,
PoC (rough): Comparing apache:main...areusch:move-backend-runtime · apache/tvm · GitHub
As we work to merge AOT, define the µTVM firmware-facing API, and merge support for existing embedded frameworks such as STM32, some discrepancies in the organization of TVM’s low-level API are becoming clear. This RFC addresses those by proposing the following things:
- creating a new header file
include/tvm/runtime/c_packed_func.h
to document the C-facing PackedFunc calling convention. - moving those typedefs related to calling C PackedFunc from
c_runtime_api.h
intoc_packed_func.h
- redefining the split between
c_backend_api
andc_runtime_api
as:c_backend_api
contains all of the functions and types depended on by generated TVM code (but the runtime is allowed to use these too), whilec_runtime_api
contains functions and types used only by the TVM runtime.
Motivation
The code present in a typical TVM model deployment can be logically split into pieces as follows:
graph executor ----- c_runtime_api
| ^
compiled operators [c,so] |
| |
c_backend_api.[h,c] <-----+
|
platform-specific[platform.h, platform-specific implementation location]
In this split, the TVM codebase directly contributes these pieces:
- graph_executor, responsible for driving model inference end-to-end
- c_runtime_api, contains infrastructure to support graph_executor plus user-facing functions
- c_backend_api, contains functions called by the generated operators
We are currently undertaking implementation of two features which, taken together, allow users to run model inference with nearly no runtime requirements under certain use cases (CPU-only workloads, static models only):
- an Ahead-of-Time compilation flow which removes the need for an Executor at inference time (or replaces it with a generated AOT executor reliant only on c_backend_api)
- an “unpacked” calling convention, which removes type metadata from all model calls where it is not needed.
These features are creating a parallel path to model execution:
AOT [c] graph executor ----- c_runtime_api
| | ^
compiled operators [c,so] |
| |
c_backend_api.[h,c] <--------------+
|
platform-specific[platform.h, platform-specific implementation location]
Given these new features, it can be confusing for implementers to determine which functions from the TVM codebase are required for model inference. In the previous world, the requirement of graph_executor alone meant that all of the above pieces were required. The introduction of AOT means that users may no longer be interested in including the entire c_runtime_api in their deployed code. However, attempts to get rid of c_runtime_api.h
have exposed these problems with the internal organization:
-
The calling convention for
TVMBackendPackedCFunc
(typedef describing the signature of generated model functions) states:/*! * \brief Signature for backend functions exported as DLL. * * \param args The arguments * \param type_codes The type codes of the arguments * \param num_args Number of arguments. * \param out_ret_value The output value of the the return value. * \param out_ret_tcode The output type code of the return value. * \param resource_handle Pointer to associated resource. * * \return 0 if success, -1 if failure happens, set error via TVMAPISetLastError.
However,
TVMAPISetLastError
resides inc_runtime_api.h
. In practice, this is only used when schedules offload implementation to third-party libraries by callingPackedFunc
at inference time. -
The docs for the
PackedFunc
calling convention are not very discoverable (they’re buried inc_runtime_api
even though used by generated model functions; and there are actually two definitions ofPackedFunc
typedefs inc_runtime_api
(see below)), and some interactions between the runtime and PackedFunc are not documented at all (e.g. memory management of complex types returned from PackedFunc). -
PackedFunc implementations can be categorized into two distinct usage patterns:
- generated model functions, which mainly take DLTensorHandle as arguments and return nothing
- usage in the TVM runtime (e.g. GraphExecutor), which may return complex objects which may require that the caller takes ownership of their memory management
To address the challenges of calling PackedFunc in category (2), an additional type
TVMPackedCFunc
was defined inc_runtime_api.h
:/*! * \brief C type of packed function. * * \param args The arguments * \param type_codes The type codes of the arguments * \param num_args Number of arguments. * \param ret The return value handle. * \param resource_handle The handle additional resouce handle from fron-end. * \return 0 if success, -1 if failure happens, set error via TVMAPISetLastError. * \sa TVMCFuncSetReturn */ typedef int (*TVMPackedCFunc)(TVMValue* args, int* type_codes, int num_args, TVMRetValueHandle ret, void* resource_handle);
You’d be forgiven for confusing this with
TVMBackendPackedCFunc
, defined in the same file (and pasted above), and which is the actual typedef of the PackedFunc generated for model inference. The difference is theTVMRetValueHandle
arg, which allows the runtime to take ownership of returned complex types e.g.string
,bytes
, andObjectHandle
.
Additional motivation: splitting src/runtime/crt/common
library
At present, the C runtime places the implementations of both c_runtime_api
and c_backend_api
into the same logical C library (.a
). As we move to slim down the runtime required for standalone deployment on embedded platforms, it makes sense to split the common
library into two pieces:
-
c_backend_api
implementations, required at deploy time with AOT -
c_runtime_api
implementations, required at deploy time with Graph Executor and for host-driven inference
Making the split between these two usages explicit in the header files will help this effort.
Proposals
This RFC proposes to cleanup these discrepancies as follows:
Create include/tvm/runtime/c_packed_func.h
Create a new header file to document the PackedFunc
used in Model Inference. This is the one that people care about anyway; they shouldn’t be having to tease apart BackendPackedCFunc
from PackedCFunc
.
In this file, do the following:
-
Place
TVMBackendPackedCFunc
typedef plus all dependent typedefs (e.g.TVMArgTypeCode
,TVMByteArray
,TVMDeviceExtType
,TVMValue
). Things that belong here are anything involved in the type signature or documentation ofTVMBackendPackedCFunc
. -
Rename
TVMBackendPackedCFunc
.PackedCFunc
merges two names together into a confusing amalgamation.- R1.
TVMCPackedFunc
(conflicts withtvmc
the command-line tool…) - R2.
CTVMPackedFunc
- R3.
TVMBackendCPackedFunc
(readable but notBackend
-only)
- R1.
-
Move
TVMAPISetLastError
to this file and rename toTVMPackedFuncSetLastError
. This function is mentioned inTVMBackendPackedCFunc
\returns
doc.
Rename TVMPackedCFunc
This typedef is solely confined to frontend use and exists to help with memory management. It effectively wraps TVMBackendPackedCFunc
. From a frontend perspective, it is the PackedFunc
you’d like users to interact with, but it doesn’t document the calling convention; so as such, it shouldn’t be named as though it were the definition of C PackedFunc.
Options:
- F1.
TVMFrontendCPackedFunc
– to match usage with the frontend only - F2.
TVMRuntimeCPackedFunc
– formalizes the notion that the runtime is a client of the backend
Drawbacks of these changes
This change is mostly organizational. The main drawbacks are breakage to downstream users due to the renames and changes to the include paths. We will mitigate that by publicizing this change plus a migration guide in the forums.
For discussion
- Do you support or oppose this change?
- Which F/B naming option do you prefer?
- Are there things in particular missing from the PackedFunc docs?
cc @stoa @manupa-arm @giuseros @mousius @tqchen @jroesch @tkonolige @mehrdadh @junrushao