Implementing AOT in TVM

Hi all,

Thanks for the interesting discussion! So, we all agree that there are three points here:

  • Backend API
  • Calling convention
  • Runtime API As things stand today, memory allocation is part of the backend API. This will change with global memory planning, but for now I would tend to skip the C1 concern about memory and discuss it in a separate RFC.

I agree that the way forward is some type of W1{a,b,c}. I will try to sum up the points in my own words, correct me if I am wrong

Backend API

As @areusch correctly pointed out, this is the API that the code generated uses as utility functions. From my POV this is the real runtime of our compiler. Our approach would be to reduce, at least for AoT, this API to a minimum set of functions

  • Memory allocation (for now)
  • Parallel execution
  • What about errors? For now the error API (setLastError, getLastError) is part of the c_runtime_api, but the setter should be part of the backend API and the getter of the runtime_api.

I agree with @areusch about having a crt_backend_api.c minimal and a rcp_backend_api.c that adds more functionality.

Would it make sense to also have a crt_backend_api.h as well? Or we should still reuse the original c_backend_api.h interface? I am asking because that interface defines things like TVMValue 64 bits unions, which clash with a minimalist embedded environment (more on that in a second). Also for now TVMBackendAllocWorkspace is accepting a int64 parameter, which would be nice to remove (even though we will remove it once we do global memory planning).

Calling convention

This is the bit I think is more controversial. So, to make things clear, when we refer to a CPackedFunc, we are talking about:

typedef int (*TVMBackendPackedCFunc)(TVMValue* args, int* type_codes, int num_args,
                                     TVMValue* out_ret_value, int* out_ret_tcode,
                                     void* resource_handle);

From what I understand @kparzysz you are saying that the internal functions don’t matter (they will be static private functions) but that the runner function should have this signature. Can I ask you why? Actually, we are trying to move toward a C compatible API for both internal operators and the external runner function:

typedef int (*TVMBackendCFunc)(void** inputs, void** outputs, void* resource_handle);

For three main reasons:

  • TVMValue is a int64 union, and most embedded devices will struggle to deal with int64.
  • TVMValue s need to be packed/unpacked every time for every operator call
  • If the user has got an array of inputs, and passes it to the runner function, the runner function needs to dynamically create an array of TVMValues on the stack and populate it with the inputs from the user.

@areusch @tqchen I guess that we can add a TVMBackendPackedCFunc wrapper function if the RPC side of the things need it. But is there any reason for not having the low level function written in plain C without typeids?

Runtime API

Now that can be quite a long conversation if we want to draft it all here :slightly_smiling_face: Let’s try to define some guidelines, taking as example the function to “run” a network:

  • The main function exposed to the user should be the tvm_runtime_run in the style of the bundle_static.c
  • The RPC API and graph executor can easily implement tvm_runtime_run, indeed this is already done in bundle_static.c
  • AOT will use that to act on the internal structure tvm_model_t which has been code generated.

Actually, I think the best way to move forward would be to sketch something and progressively agree on how it looks. Have you got any suggestions on how to do this sort of “sketching”? Maybe a draft PR not meant to be merged but only to spark discussion?

Thanks again, this is all very interesting.

Giuseppe