Hi all,
Thanks for the interesting discussion! So, we all agree that there are three points here:
- Backend API
- Calling convention
- Runtime API As things stand today, memory allocation is part of the backend API. This will change with global memory planning, but for now I would tend to skip the C1 concern about memory and discuss it in a separate RFC.
I agree that the way forward is some type of W1{a,b,c}. I will try to sum up the points in my own words, correct me if I am wrong
Backend API
As @areusch correctly pointed out, this is the API that the code generated uses as utility functions. From my POV this is the real runtime of our compiler. Our approach would be to reduce, at least for AoT, this API to a minimum set of functions
- Memory allocation (for now)
- Parallel execution
- What about errors? For now the error API (
setLastError
,getLastError
) is part of thec_runtime_api
, but the setter should be part of the backend API and the getter of theruntime_api
.
I agree with @areusch about having a crt_backend_api.c
minimal and a rcp_backend_api.c
that adds more functionality.
Would it make sense to also have a crt_backend_api.h
as well? Or we should still reuse the original c_backend_api.h
interface? I am asking because that interface defines things like TVMValue
64 bits unions, which clash with a minimalist embedded environment (more on that in a second). Also for now TVMBackendAllocWorkspace
is accepting a int64
parameter, which would be nice to remove (even though we will remove it once we do global memory planning).
Calling convention
This is the bit I think is more controversial. So, to make things clear, when we refer to a CPackedFunc
, we are talking about:
typedef int (*TVMBackendPackedCFunc)(TVMValue* args, int* type_codes, int num_args,
TVMValue* out_ret_value, int* out_ret_tcode,
void* resource_handle);
From what I understand @kparzysz you are saying that the internal functions don’t matter (they will be static private functions) but that the runner function should have this signature. Can I ask you why? Actually, we are trying to move toward a C compatible API for both internal operators and the external runner function:
typedef int (*TVMBackendCFunc)(void** inputs, void** outputs, void* resource_handle);
For three main reasons:
-
TVMValue
is aint64
union, and most embedded devices will struggle to deal withint64
. -
TVMValue
s need to be packed/unpacked every time for every operator call - If the user has got an array of inputs, and passes it to the runner function, the runner function needs to dynamically create an array of TVMValues on the stack and populate it with the inputs from the user.
@areusch @tqchen I guess that we can add a TVMBackendPackedCFunc
wrapper function if the RPC side of the things need it. But is there any reason for not having the low level function written in plain C without typeid
s?
Runtime API
Now that can be quite a long conversation if we want to draft it all here Let’s try to define some guidelines, taking as example the function to “run” a network:
- The main function exposed to the user should be the
tvm_runtime_run
in the style of thebundle_static.c
- The RPC API and graph executor can easily implement
tvm_runtime_run
, indeed this is already done inbundle_static.c
- AOT will use that to act on the internal structure
tvm_model_t
which has been code generated.
Actually, I think the best way to move forward would be to sketch something and progressively agree on how it looks. Have you got any suggestions on how to do this sort of “sketching”? Maybe a draft PR not meant to be merged but only to spark discussion?
Thanks again, this is all very interesting.
Giuseppe