[pre-RFC] [API Change] Formalizing c_backend_api

areusch · July 2, 2021, 9:46pm

Hi all,

PoC (rough): Comparing apache:main...areusch:move-backend-runtime · apache/tvm · GitHub

As we work to merge AOT, define the µTVM firmware-facing API, and merge support for existing embedded frameworks such as STM32, some discrepancies in the organization of TVM’s low-level API are becoming clear. This RFC addresses those by proposing the following things:

creating a new header file include/tvm/runtime/c_packed_func.h to document the C-facing PackedFunc calling convention.
moving those typedefs related to calling C PackedFunc from c_runtime_api.h into c_packed_func.h
redefining the split between c_backend_api and c_runtime_api as: c_backend_api contains all of the functions and types depended on by generated TVM code (but the runtime is allowed to use these too), while c_runtime_api contains functions and types used only by the TVM runtime.

Motivation

The code present in a typical TVM model deployment can be logically split into pieces as follows:

                       graph executor ----- c_runtime_api
                               |                 ^
                  compiled operators [c,so]      |
                                  |              |
                      c_backend_api.[h,c]  <-----+
                              |
                    platform-specific[platform.h, platform-specific implementation location]

In this split, the TVM codebase directly contributes these pieces:

graph_executor, responsible for driving model inference end-to-end
c_runtime_api, contains infrastructure to support graph_executor plus user-facing functions
c_backend_api, contains functions called by the generated operators

We are currently undertaking implementation of two features which, taken together, allow users to run model inference with nearly no runtime requirements under certain use cases (CPU-only workloads, static models only):

an Ahead-of-Time compilation flow which removes the need for an Executor at inference time (or replaces it with a generated AOT executor reliant only on c_backend_api)
an “unpacked” calling convention, which removes type metadata from all model calls where it is not needed.

These features are creating a parallel path to model execution:

               AOT [c]          graph executor ----- c_runtime_api
                   |                    |                 ^
                  compiled operators [c,so]               |
                                  |                       |
                      c_backend_api.[h,c]  <--------------+
                              |
                    platform-specific[platform.h, platform-specific implementation location]

Given these new features, it can be confusing for implementers to determine which functions from the TVM codebase are required for model inference. In the previous world, the requirement of graph_executor alone meant that all of the above pieces were required. The introduction of AOT means that users may no longer be interested in including the entire c_runtime_api in their deployed code. However, attempts to get rid of c_runtime_api.h have exposed these problems with the internal organization:

The calling convention for TVMBackendPackedCFunc (typedef describing the signature of generated model functions) states:

/*!
 * \brief Signature for backend functions exported as DLL.
 *
 * \param args The arguments
 * \param type_codes The type codes of the arguments
 * \param num_args Number of arguments.
 * \param out_ret_value The output value of the the return value.
 * \param out_ret_tcode The output type code of the return value.
 * \param resource_handle Pointer to associated resource.
 *
 * \return 0 if success, -1 if failure happens, set error via TVMAPISetLastError.

However, TVMAPISetLastError resides in c_runtime_api.h. In practice, this is only used when schedules offload implementation to third-party libraries by calling PackedFunc at inference time.

The docs for the PackedFunc calling convention are not very discoverable (they’re buried in c_runtime_api even though used by generated model functions; and there are actually two definitions of PackedFunc typedefs in c_runtime_api (see below)), and some interactions between the runtime and PackedFunc are not documented at all (e.g. memory management of complex types returned from PackedFunc).
PackedFunc implementations can be categorized into two distinct usage patterns:
1. generated model functions, which mainly take DLTensorHandle as arguments and return nothing
2. usage in the TVM runtime (e.g. GraphExecutor), which may return complex objects which may require that the caller takes ownership of their memory management
To address the challenges of calling PackedFunc in category (2), an additional type TVMPackedCFunc was defined in c_runtime_api.h:
```
/*!
 * \brief C type of packed function.
 *
 * \param args The arguments
 * \param type_codes The type codes of the arguments
 * \param num_args Number of arguments.
 * \param ret The return value handle.
 * \param resource_handle The handle additional resouce handle from fron-end.
 * \return 0 if success, -1 if failure happens, set error via TVMAPISetLastError.
 * \sa TVMCFuncSetReturn
 */
typedef int (*TVMPackedCFunc)(TVMValue* args, int* type_codes, int num_args, TVMRetValueHandle ret,
                              void* resource_handle);
```
You’d be forgiven for confusing this with TVMBackendPackedCFunc, defined in the same file (and pasted above), and which is the actual typedef of the PackedFunc generated for model inference. The difference is the TVMRetValueHandle arg, which allows the runtime to take ownership of returned complex types e.g. string, bytes, and ObjectHandle.

Additional motivation: splitting `src/runtime/crt/common` library

At present, the C runtime places the implementations of both c_runtime_api and c_backend_api into the same logical C library (.a). As we move to slim down the runtime required for standalone deployment on embedded platforms, it makes sense to split the common library into two pieces:

c_backend_api implementations, required at deploy time with AOT
c_runtime_api implementations, required at deploy time with Graph Executor and for host-driven inference

Making the split between these two usages explicit in the header files will help this effort.

Proposals

This RFC proposes to cleanup these discrepancies as follows:

Create `include/tvm/runtime/c_packed_func.h`

Create a new header file to document the PackedFunc used in Model Inference. This is the one that people care about anyway; they shouldn’t be having to tease apart BackendPackedCFunc from PackedCFunc.

In this file, do the following:

Place TVMBackendPackedCFunc typedef plus all dependent typedefs (e.g. TVMArgTypeCode, TVMByteArray, TVMDeviceExtType, TVMValue). Things that belong here are anything involved in the type signature or documentation of TVMBackendPackedCFunc.
Rename TVMBackendPackedCFunc. PackedCFunc merges two names together into a confusing amalgamation.
- R1. TVMCPackedFunc (conflicts with tvmc the command-line tool…)
- R2. CTVMPackedFunc
- R3. TVMBackendCPackedFunc (readable but not Backend-only)
Move TVMAPISetLastError to this file and rename to TVMPackedFuncSetLastError. This function is mentioned in TVMBackendPackedCFunc \returns doc.

Rename `TVMPackedCFunc`

This typedef is solely confined to frontend use and exists to help with memory management. It effectively wraps TVMBackendPackedCFunc. From a frontend perspective, it is the PackedFunc you’d like users to interact with, but it doesn’t document the calling convention; so as such, it shouldn’t be named as though it were the definition of C PackedFunc.

Options:

F1. TVMFrontendCPackedFunc – to match usage with the frontend only
F2. TVMRuntimeCPackedFunc – formalizes the notion that the runtime is a client of the backend

Drawbacks of these changes

This change is mostly organizational. The main drawbacks are breakage to downstream users due to the renames and changes to the include paths. We will mitigate that by publicizing this change plus a migration guide in the forums.

For discussion

Do you support or oppose this change?
Which F/B naming option do you prefer?
Are there things in particular missing from the PackedFunc docs?

cc @stoa @manupa-arm @giuseros @mousius @tqchen @jroesch @tkonolige @mehrdadh @junrushao

Mousius · July 5, 2021, 12:53pm

This is great @areusch! I appreciate the ability to re-introduce c_backend_api.h to leverage existing abstractions without having to necessarily use c_packed_func.h. It’d be great if we only required the single backend header file, I think the only thing that prevents is having to copy function_attributes.h across as well - I’d suggest all of the backend definition, including attributes, could live in the single c_backend_api.h?

Extending that a bit, if we only need to take c_backend_api.h into a project as a header, could we not raise these headers out of the runtime folder? Potentially include/tvm/c_backend_api.h or even include/tvm_backend_api.h? My assumption here is that the backend would always be written in something with a stable C ABI. This could potentially be true of c_packed_func.h/tvm_packed_func.h as well, leaving only the runtime-specific headers in the runtime folder?

I’d be interested in considering just TVMPackedFunc? Is there a reason to mark it explicitly as a C function?

areusch · July 20, 2021, 7:02pm

@Mousius thanks for your comments!

It’d be great if we only required the single backend header file, I think the only thing that prevents is having to copy function_attributes.h across as well - I’d suggest all of the backend definition, including attributes, could live in the single c_backend_api.h ?

this would probably be okay for now. if the #define required become complex, we may need to eventually refactor. i’m not sure it’s a big deal to split things across multiple files so long as they are co-located.

Extending that a bit, if we only need to take c_backend_api.h into a project as a header, could we not raise these headers out of the runtime folder?

I do think that these are all quite runtime-related when considered as part of the full TVM landscape e.g. at include/tvm level. I think one could see operator implementations as somewhat distinct from hand-written runtime code, but ultimately at the include/tvm level, i think runtime makes sense. I do think we should split src/runtime/crt/common into parts so that it’s easy to differentiate between implementation code required for host-driven vs standalone/aot deployment.

I’d be interested in considering just TVMPackedFunc ? Is there a reason to mark it explicitly as a C function?

Uh good point at the time of writing I was for some reason thinking this was taken by the c++ implementation. However, that one is tvm::runtime::PackedFunc, so TVMPackedFunc should work well.