[RFC] MISRA-C changes for RPC support

areusch · June 25, 2020, 3:50pm

Motivation

As part of the Standalone µTVM Roadmap, the TVM RPC server is being implemented on bare metal devices. The overall approach is to link MinRPCServer against the MISRA-C runtime plus additional compiled TVM functions. There are several new features that need to be added to the MISRA-C runtime in order to service all of the RPC requests.

TVMFuncGetGlobal and TVMModGetFunction are currently implemented as hardcoded if-else blocks. Some scalable, programmatic solution is needed to interact with both generated TVMModules and PackedCFuncs from third-party code.
Since Modules are typically instantiated in the C++ runtime using dlpack , some strategy for Module instantiation is needed in the MISRA-C runtime.
Unit tests coverage should be improved, and black-box tests need to be written, since the MISRA- C runtime tests will eventually be used as an acceptance test for µTVM devices. This RFC doesn’t specifically address testing, but does address some improvements needed to enable tests to be written:
1. There is no standalone build of the MISRA-C runtime currently checked-in; so, it’s possible that the MISRA-C runtime could depend on C++ code.
2. There isn’t a way to write tests against the MISRA-C runtime without linking it to libtvm.so.
Some C++ features are being used, such as exit() , which may need to be extended in an embedded setting. Also, a logging solution needs to be devised to ease debugging.

Changes to the MISRA-C Runtime

FuncRegistry

Function lookup by string name is a common RPC task. Currently, the MISRA-C runtime implements function lookup using an if-else statement tree. This RFC introduces a new type, TVMFuncRegistry , which can be used for both global and module functions.

/*!
 * \brief A data structure that facilitates function lookup by C-string name.
 */
typedef struct TVMFuncRegistry {
  /*! \brief Names of registered functions, concatenated together and separated by \0.
   * An additional \0 is present at the end of the concatenated blob to mark the end.
   *
   * Byte 0 is the number of functions in `funcs`.
   */
  const char* names;

  /*! \brief Function pointers, in the same order as their names in `names`. */
  const TVMBackendPackedCFunc* funcs;
} TVMFuncRegistry;

The design constraints were:

Assume a small number of functions (i.e. < 30)
All lookup functions should work on a const instance of the Registry so it can be placed in flash.
Easy to generate in TVM codegen.

The names field is a uint8_t count followed by the C-string names of functions concatenated together. A terminating \0 marks the end of the names list, so when count > 0 , names always ends in a double \0\0 .

Byte:      0   1   2   3   4   5   6   7   8   9   a   b   c   d  
         +---+---+---+---+---+---+---+---+---+---+---+---+---+---+
names:   | N | F   u  n   c   0   \0 | F   u   n   c   1   \0| \0|
         +---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Legend:
 N - `count` field
 Func0 - 0th registered function name
 Func1 - 1st registered function name

Function lookup is done by linearly scanning the concatenated names until a match is found. Function handles are encoded as the 0-based index of the matching name in the FuncRegistry.

From the index, the function pointer can be retrieved in constant time. The index is validated using the count field and the pointer is returned by indexing funcs .

MutableFuncRegistry

Unlike the SystemLib Module, the global function namespace can change at runtime. MutableFuncRegistry is a RAM-based implementation of FuncRegistry that adds a Set function.

/*!
 * \brief A TVMFuncRegistry that supports adding and changing the functions.
 */
typedef struct TVMMutableFuncRegistry {
  TVMFuncRegistry registry;

  /*! \brief maximum number of functions in this registry. */
  size_t max_functions;
} TVMMutableFuncRegistry;

Here, registry is presumed to be mutable, so Set just directly modifies names and funcs . The create function accepts a single block of memory and partitions it into names and the rest of the data structure. It computes a capacity based on an average function name length and stores that in max_functions .

Modules

CRT Modules are implemented as structs whose first member is a TVMModule :

/*!
 * \brief Module container of TVM.
 */
typedef struct TVMModule {
  /*! \brief The function registry associated with this module. */
  const TVMFuncRegistry* registry;
} TVMModule;

Modules can be registered with the CRT (though a public-facing function is yet TBD). Upon registration, a pointer is placed in a global modules array, and the module is assigned id equal to the index in this array.

Note that TVMBackendRegisterSystemLibSymbol will not be implemented in the CRT C implementation. SystemLibs generated for this implementation are expected to contain a TVMModule instance and implement the initialization discussed under Life Cycle.

TVMFunctionHandle

TVMFunctionHandles are now composed of two pieces:

 31  30      bits         16 15         bits           0
+---------------------------+---------------------------+
| V |     Module Index      |        Function Index     |
+---------------------------+---------------------------+

Legend
 V: Module Index Valid bit (1 = valid, 0 = invalid; global function)

TVMFuncCall

TVMFuncCall is used to satisfy the kCallFunc RPC code. No TVMFuncCall implementation exists in the MISRA-C runtime. One goal in implementing it is to do something such that C code could be called from either the C++ implementation or the MISRA-C runtime without modification. This section describes the existing codebase to motivate some refactoring.

At present, TVMFuncCall invokes C++ functions using the CallPacked :

void PackedFunc::CallPacked(TVMArgs args, TVMRetValue* rv);

When calling C functions, it’s customary to provide a void* , termed in TVM as resource_handle . The C++ implementation of the TVM C Runtime API provides this using a closure glue function:

int TVMFuncCreateFromCFunc(TVMPackedCFunc func, void* resource_handle, TVMPackedCFuncFinalizer fin,
                           TVMFunctionHandle* out) {
  // ...
  *out = new PackedFunc([func, resource_handle](TVMArgs args, TVMRetValue* rv) {
    int ret = func((TVMValue*)args.values, (int*)args.type_codes,  // NOLINT(*)
                   args.num_args, rv, resource_handle);
    if (ret != 0) {
      throw dmlc::Error(TVMGetLastError() + ::dmlc::StackTrace());
    }
  });
  // ...
}

This glue calls functions with signature:

/*!
 * \\brief C type of packed function.
 *
 * \\param args The arguments
 * \\param type_codes The type codes of the arguments
 * \\param num_args Number of arguments.
 * \\param ret The return value handle.
 * \\param resource_handle The handle additional resouce handle from fron-end.
 * \\return 0 if success, -1 if failure happens, set error via TVMAPISetLastError.
 * \\sa TVMCFuncSetReturn
 */
typedef int (*TVMPackedCFunc)(TVMValue* args, int* type_codes, int num_args, TVMRetValueHandle ret,
                              void* resource_handle);

Meanwhile, generated TVM functions use yet another type signature:

/*!
 * \\brief Signature for backend functions exported as DLL.
 *
 * \\param args The arguments
 * \\param type_codes The type codes of the arguments
 * \\param num_args Number of arguments.
 * \\param out_ret_value The output value of the the return value.
 * \\param out_ret_tcode The output type code of the return value.
 *
 * \\return 0 if success, -1 if failure happens, set error via TVMAPISetLastError.
 */
typedef int (*TVMBackendPackedCFunc)(TVMValue* args, int* type_codes, int num_args,
                                     TVMValue* out_ret_value, int* out_ret_tcode);

First, on devices with a limited amount of non-virtual memory, it’s desirable to avoid allocations. Here, it would be good to eliminate the closure.

Second, In order to both invoke generated functions and C libraries compatible with full TVM, the µTVM RPC server needs to call both TVMBackendPackedCFunc and TVMPackedCFunc . There are two differences between the two:

TVMPackedCFunc places the out_ret_value and out_ret_code into a single C struct TVMRetValue . The purpose of this struct is to make it easier to invoke TVMCFuncSetReturn , whose purpose is to ease the process of transferring managed objects from C to C++.
TVMBackendPackedCFunc lacks a resource_handle parameter.

For Discussion

There are a couple of approaches to handling the discrepancy in APIs here:

A1. Decide that each FuncRegistry can store functions of only one type, and store a boolean indicating the type.

A2. Store this flag per-function.

A3. Resolve the differences in API and delete one of the types.

A4. Add support for calling TVMBackendPackedCFunc from the C++ runtime, and standardize on only calling TVMBackendPackedCFunc from this MISRA-C runtime.

There are also some options for handling resource_handle :

O1. Store a has_resource_handle bit plus associated resource_handle for each function in a TVMFuncRegistry.

O2. Add resource_handle to TVMBackendPackedCFunc, and decide that, as a MISRA-C constraint, resource_handle is:

when invoking a module function: set to the TVMModule* that contains the function
set to NULL when invoking a global function.

In the long term, I think we should adopt approach A4 and option O2. This would mean:

The function signature for C functions, including generated TVM functions, will be:

typedef int (*TVMBackendPackedCFunc)(TVMValue* args, int* type_codes, int num_args,
                              TVMValue* out_ret_value, int* out_ret_tcode,
                              void* resource_handle);

TVMCFuncSetReturn will not be implemented for the MISRA-C runtime.
resource_handle will be passed as in O2.
Codegen for TVM functions will be changed to include the 6th runtime parameter. Some platforms may not technically need any modification, because the argument passing strategy will ignore extra arguments at runtime.

CRT Life Cycle

In the C++ implementation of the CRT, much of the initialization process is accomplished by global initializers. For runtimes targeting bare-metal devices, this is less desirable for a couple of reasons:

Often SoCs reset to a slow CPU frequency in a high power configuration. Running initialization code as soon as possible improves system startup performance and thereby power draw.
Where control loops are involved, the system startup time often bounds the maximum jitter seen in the control loop. RTOS or interrupt-driven control are typically employed to address this problem, and need to be ultimately in control of the system while user code (i.e. µTVM) is executing.
The memory subsystem is initialized by user code, and that may not have happened when global initializers are running.

For that reason, the C implementation defines an explicit initializer for the runtime:

/*! \brief Initialize internal runtime components.
 *
 * Call this function before calling any other functions in the runtime.
 * \\return 0 when success, negative integer when error occurs.
 */
TVM_DLL int TVMInitializeRuntime(void);

Additionally, an explicit initializer is defined for the SystemLib:

/*! \brief Entry point for the system lib module. */
const TVMModule* TVMSystemLibEntryPoint(void);

This function is called when runtime.SystemLib is invoked for the first time.

Module Initialization

Modules are currently registered using an internal function:

int _TVMModCreateFromCModule(const TVMModule* mod, TVMModuleHandle* out_handle);

Client code can get a TVMModuleHandle by invoking a PackedFunc which returns one. Currently, the only such PackedFunc exposed as part of the CRT is runtime.SystemLib . C code can register additional such functions using TVMFuncRegisterGlobal . This will probably be exposed as a public CRT API in a future PR, but not part of this RFC now.

Isolated CRT Build

At present, the CRT is built by #includeing a series of .c files into bundle_deploy . There isn’t anything else currently at master enforcing a separation between CRT and TVM C++ runtime, particularly for unit testing. The C-based framing and session layers introduced below are intended to be linked into both the CRT and the TVM C++ runtime, so it would help to have better assurance that CRT components truly don’t rely on any C++ headers or linked pieces. Also, it will be difficult (at least initially) to run unit tests for the framing and session layers on microcontrollers, so linking those against just the CRT improves confidence in their portability.

For all of those reasons, the standalone CRT will be copied to a directory in build and compiled from there using a make -based build without the traditional include paths. The make-based build is also useful as a starting point to integrate the CRT with custom firmware.

`crt_config.h`

In building the CRT as a standalone library (see Isolated CRT Build above), some parameters need to be fixed at compile time. A new file, crt_config.h , will #define those parameters. Each time the CRT is compiled for a particular target environment, this file needs to be provided to the make -based build system.

Initially, crt_config.h will contain the data from apps/bundle_deploy/runtime.c with changes and similar parameters for TVMFuncRegistry support:

/*! Support low-level debugging in MISRA-C runtime */
#define TVM_CRT_DEBUG 0

/*! Maximum supported dimension in NDArray */
#define TVM_CRT_MAX_NDIM 6
/*! Maximum supported arguments in generated functions */
#define TVM_CRT_MAX_ARGS 10
/*! Maximum supported string length in dltype, e.g. "int8", "int16", "float32" */
#define TVM_CRT_STRLEN_DLTYPE 10
/*! Maximum supported string length in function names */
#define TVM_CRT_STRLEN_NAME 80

/*!
 * \brief Log memory pool size for virtual memory allocation
 *
 * Here is a list of possible choices:
 * * use 16 for 64 KiB memory space
 * * use 17 for 128 KiB memory space
 * * use 18 for 256 KiB memory space
 * * use 19 for 512 KiB memory space
 * * use 20 for 1 MiB memory space
 * * use 21 for 2 MiB memory space
 * * use 22 for 4 MiB memory space
 * * use 23 for 8 MiB memory space
 * * use 24 for 16 MiB memory space
 * * use 25 for 32 MiB memory space
 * * use 26 for 64 MiB memory space
 * * use 27 for 128 MiB memory space
 * * use 28 for 256 MiB memory space
 */
#define TVM_CRT_LOG_VIRT_MEM_SIZE 24

/*! \\brief Page size for virtual memory allocation */
#define TVM_CRT_PAGE_BYTES 4096

Testing

Unit tests can be written against the isolated CRT build without linking parts of the TVM runtime. Integration tests will be done using the µTVM RPC Server, and discussed further after that lands.

`TVMPlatform` functions

A new function prefix will be introduced, TVMPlatform , which contains functions that vary by platform. These are not provided by CRT and should be implemented by each platform. In this RFC, only the following function is added:

void __attribute__((noreturn)) TVMPlatformAbort(int error_code) - called when a CHECK failure occurs

tqchen · June 25, 2020, 8:06pm

cc @liangfu @tgall_foo

liangfu · June 26, 2020, 1:10am

Hi @areusch, thanks for proposing RPC support to MISRA-C runtime.

Regarding to the TVMFuncRegistry design, the names field are designed to be rather compact, and requires a special handling of the list of strings. I would rather propose an alternative way that we could use fixed-length strings, so that we could easily access all strings (with constant time) without linearly scanning from the beginning, and still it’s a single block of string.

Regarding to the API design, can we make it more consistent with existing API design in the default TVM runtime? The consistency would help us maintain both runtimes with existing TVM header files and APIs.

areusch · June 26, 2020, 1:36am

hi @liangfu, thanks for taking a look at my proposal!

I agree the names field is a little complex. here are some alternatives:

N1. use an array of const char[MAX_FUNC_NAME_LENGTH][num_funcs]. The positive is that you can traverse the list without scanning; the negative is it could waste space especially for small functions. if that wasted space is >1 word, you may as well consider N2. another negative is that MAX_FUNC_NAME_LENGTH may need to be adjusted frequently.

N2. use an array of const char*, and encode each name as a separate string in flash. the positive is you can traverse the whole list without scanning every name. the negative is adding 1 word per function name. I think here, i’m wondering in what case you wouldn’t need to traverse; I think the answer is likely if you sorted the names and implemented binary search. you could do that; i’d argue that it’s not that important yet (num funcs is small, and function name lookup is (i don’t think) that time-sensitive) and is itself added complexity/code space. however, it is well-understood so the complexity doesn’t bother me as much.

i’m open to each of these, but I would like to point out that special handling done here is just combining strcmp with strlen. I agree it’s still artisanal.

could you be a bit more specific wrt your API design comment?

Andrew

liangfu · June 27, 2020, 12:09am

could you be a bit more specific wrt your API design comment?

Sorry, I meant to be consistent with the registry API / implementation defined at src/runtime/registry.cc. Specifically, I think it’s better to implement Registry::Manager for CRT, instead of implementing TVMFuncRegistry, which changed the consistency between the two runtime implementations. In addition, the name for such implement could be registry.c to maintain consistency.

Maintaining consistency helps existing users of the default runtime to easily switch to CRT for their resource constrained systems. And it’ll be easier for future contributors to dive into the implementation.

areusch · June 27, 2020, 12:39am

Ah gotcha. I think there’s one thing there I’m not sure how to solve in a good way: TVM generates TVMBackendPackedCFunc from llvm and c codegens. Because TVMFuncCall RPC call accepts a function handle, we have to provide something there that contains enough data to differentiate between TVMBackendPackedCFunc and PackedFunc. The C++ runtime does this with LibraryModule, which creates a closure. I would say on the embedded side, we should avoid this strategy, since that requires some type of dynamic memory allocation.

We could store some module- or function-level bits to differentiate, but then that’s also more RAM. I had some discussions with @tqchen about unifying the runtime impls–maybe he can weigh in on this? It seems like we could just unify the runtime impls, and make TVMBackendPackedCFunc the cross-runtime standard (even if we modify its signature). Then runtimes would only need to know how to call that one type of function.

areusch · July 3, 2020, 3:25am

@tqchen @liangfu bundle_deploy now works with the PR.

I understand better now your comment in the RFC about trying to implement a C++ Registry in CRT. here’s a better response:

as it stands, bundle_deploy currently works with both CRT and C++ runtimes, but in order to run with CRT in a situation where you may want to load a .so, bundle_deploy itself implements an API. this is the true API right now between a “frontend” talking to bundle_deploy and TVM (C++ or CRT).
creating a common interface makes sense, but I think right now we lack a comprehensive PackedFunc implementation in CRT. we have one, but it doesn’t do everything you’d like it to (I.e. argument encoding is tricky and return values are currently unsupported). also, I don’t think you’d want PackedFunc to invoke TVMFuncCall; it’s more efficient to have it e.g. std::forward the args along somehow.
the motivation for placing only TVMBackendPackedCFunc behind TVMFuncCall is to simplify the runtime and avoid allocation.

it’s better to just have the runtime interface with one function type for simplicity’s sake
if we supported others as we do in c++ runtime, we would need to allocate dynamic memory to build closures that adapt between TVMBackendPackedCFunc and other types of functions. I think in e.g. µTVM context, having predictable dynamic memory is pretty important (I.e. it now it’s just tensors and function calls). so, i’d propose to build out the PackedFunc CRT implementation to call TVMBackendPackedCFunc. i’ve added functions here that allow you to populate PackedFunc using TVMFuncGetGlobal, so hopefully this helps align the CRT PackedFunc usage with the C++ registry. we can also build a CRT registry that uses these functions so the interface is the same.

how does this design tradeoff seem to you?

liangfu · July 3, 2020, 4:06am

Hi @areusch, thanks for the proposal. I agree having predictable dynamic memory is important, and it’s a good idea to build out the PackedFunc CRT implement to call TVMBackendPackedCFunc, as long as it could be helpful in aligning CRT implement with the c++ runtime.

I think most of my concerns have been addressed.

jinchenglee · January 20, 2021, 7:24pm

Hi, @areusch, I have a question regarding TVMFuncCall section.

I’m using the MISRA-C implementation that uses runtime sources code in runtime/crt/common/crt_runtime_api.c instead of the runtime/c_runtime_api.cc. Crt_runtime_api.c has no TVMFuncCreateFromCFunc() defined. The TVMFuncCall() is as below:

int TVMFuncCall(TVMFunctionHandle func_handle, TVMValue* arg_values, int* type_codes, int num_args,
                TVMValue* ret_val, int* ret_type_code) {
...
  TVMBackendPackedCFunc func;
...
  if (TVMFuncRegistry_GetByIndex(registry, function_index, &func) != 0) {
    TVMAPIErrorf("invalid function index: %04" PRIx16, function_index);
    return -1;
  }
...
  return func(arg_values, type_codes, num_args, ret_val, ret_type_code, resource_handle);
}

In the end, it calls func() which has signature with the resource_handle argument. However, in https://github.com/apache/tvm/blob/main/include/tvm/runtime/packed_func.h, the exported function macro is as below:

#define TVM_DLL_EXPORT_TYPED_FUNC(ExportName, Function)                                     \
  extern "C" {                                                                              \
  TVM_DLL int ExportName(TVMValue* args, int* type_code, int num_args, TVMValue* out_value, \
                         int* out_type_code) {                                              \
...
  }                                                                                         \
  }

As shown above, the ExportName() function misses the resource_handle argument, otherwise, all other arguments match with func(…) call inside TVMFuncCall(…).

The exported function will eventually be registered into function registery in https://github.com/apache/tvm/blob/969b77a209c24470ddc4e1e93eeb26e8ea74389b/src/runtime/crt/common/func_registry.c#L118:

tvm_crt_error_t TVMMutableFuncRegistry_Set(TVMMutableFuncRegistry* reg, const char* name,
                                           TVMBackendPackedCFunc func, int override) {
...

  ((TVMBackendPackedCFunc*)reg->registry.funcs)[idx] = func;
  ((char*)reg->registry.names)[0]++;  // increment num_funcs.

  return kTvmErrorNoError;
}

It seems we forcefully convert whatever func to be TVMBackendPackedCFunc pointer although the signatures differ.

I tried this and it actually works! But this is dangerous and really hard to debug if there’s any error associated in argument passing. It works only because we always pass in w/o the resource_handle argument.

Am I missing anything here? Thanks.

areusch · January 22, 2021, 9:09pm

hi @jinchenglee, one clarification: as you’re using the MISRA-C runtime, the operator functions are actually not defined using TVM_DLL_EXPORT_TYPED_FUNC unless you’re using the demo external C codegen (I.e. BYOC flow) for a portion of your graph. instead, they’re just generated using the c host codegen. The function arguments are built here.

I think we may have some TODO to cleanup the C++ invocation of TVMBackendPackedCFunc, though I think in practice omitting the last argument does seem to work.

jinchenglee · January 23, 2021, 1:35am

Yes, I’m using BYOC, mimicking the DNNL c source codegen to spew out c source code of operator function calls, which are then wrapped into packed function by TVM_DLL_EXPORT_TYPED_FUNC(a, a_wrapper).

Indeed, omitting the last argument works in my experiment. But this needs fix in the long run.

A uniform function signature in packed function is a good idea, however, it is also more vulnerable comparing to standard c/c++ function call in this case.

areusch · January 25, 2021, 7:01pm

hi @jinchenglee,

okay makes sense. i’ve created draft PR 7338 to try and address this. resource_handle is intended to be a C opaque pointer used to capture context of function calls (e.g. Module pointers). Since TVM_DLL_EXPORT_TYPED_FUNC presumes that this has been handled in C++-land, I think this is a reasonable fix. feel free to comment further on the PR, we’ll see how it does in the regression.

jinchenglee · January 30, 2021, 6:21am

A companion PR #7343 has been merged to have added the resource_handle parameter to TVM_DLL_EXPORT_xxx packed function macros.