@Mousius ah I had also forgotten to reply to your last post.
I’ve taken a look at your PR and will post some initial feedback. Overall I think this approach is cleaner than the previous one–it removes the adaptation layer between TVM and C, and by doing so, encourages us to minimize the embedded-facing API.
One question: with the USMP proposal, what should be the input to the TIR main_func
? It seems like the ultimately-proposed thing was something like:
int32_t tvmgen_my_model_main_func(void* inputs, void* outputs, void* params1, void* params2, void* workspace1, void* workspace2) {
tvmgen_my_model_op1(inputs[0], params1[0], &workspace1[1024]);
tvmgen_my_model_op2(inputs[1], params1[128], &workspace1[1024], outputs[0]);
}
This may mean that we need to translate between tvmgen_my_model_input_t
and the array-of-input-data vector inputs
when USMP lands. I wrote everything below before considering this; just noting that the offset of struct members isn’t guaranteed, so not sure how interchangeable tvmgen_my_model_input_t
and void* inputs
is; on the other hand, when LLVM-generated bitcode indexes into a struct, (I believe) it does so by offset; so we need to ensure generated code either a) accepts arrays or b) can predictably generate offsets in the same way we expect downstream c compilers to.
cc @manupa-arm
Regarding the input/output/memory/devices split:
- I can see the purpose for
input
,output
,memory
at this point, and I expectdevices
will have one, but right now it’s not super clear - It would be great (in the PR) to document exactly what goes in each generated struct (e.g. in terms of structs or C types).
- I wonder if we should defer the
devices
struct until we have defined a model for devices in µTVM?
The other observation I had from [RFC] Unified Static Memory Planning was that the memory could be specified a bit differently:
- S1. Specify each type of memory such as parameters and workspaces with differing structures
- S2. Specify each type of memory such as parameters and workspaces as nested structures
- S3. Specify the names for these memories only and the user already has the knowledge as to which block they wanted to use
I think the challenge here is deciding how to place parameters. At present, a global symbol is defined for each parameter, and that is used in code to reference the data. Adding an entry in memory would be superfluous here. Additionally, since parameters need to be initialized, the C interface would need to provide e.g. a #define
which expands to the C array or C-string initialization of the parameter data.
Finally, since parameter parsing was previously demonstrated to be a bottleneck in the downstream compilation time, I think it’s likely that users would prefer the llvm
target to avoid this problem; and with llvm
, parameter placement is more difficult (since parameters are simplified during TVM compilation, the user would need to define their placement either universally e.g. --params-section
or based on some property of the parameter known in advance to the user e.g. param_placement={"cpu": {"section": "my_param_section"}, "accel": {"section": "accel_params"}}
. Such more complex use cases are why I’ve included params in Model Library Format (downstream project generators can then place them as needed).
So in conclusion, I think we should treat memory
as just for workspace memory, and leave params as-is for now.
Programmatic inference
One thing with this approach is: suppose there is a case where someone wants to drive inference programmatically. As an example API, consider the existing Module-based Model Runtime Interface:
int32_t tvmgen_my_model_set_input(const char* name, TVMValue value);
This API could be implemented on top of the struct API using generated code as follows:
-
First let’s get rid of the C-string and presume there is a mapping from string to int:
int32_t tvmgen_my_model_set_input(tvm_input_name_t name, TVMValue value);
-
Next, we could emit code to generate these assignments using e.g.
memcpy
:size_t tvmgen_my_model_input_offsets = {&((TVM_my_model_input*) 0)->model_input_cats, &((TVM_my_model_input*) 0)->model_input_dogs}; int32_t tvmgen_my_model_get_input_index(const char* name, int* index) { // string lookup } int32_t tvmgen_my_model_set_input(tvmgen_my_model_input_index_t index, TVMValue value, void* model_input_struct) { TVM_my_model_input* input = model_input_struct; memcpy(((uint8_t*) input) + tvmgen_my_model_input_offsets[name], value.v_handle, sizeof(value.v_handle)); } const tvm_model_descriptor_t tvmgen_my_model_descriptor = { .input_struct_size_bytes = sizeof(tvmgen_my_model_input_t), .set_input = tvmgen_my_model_set_input, .get_input_index = tvmgen_my_model_get_input_index };
And code could use this like so:
typedef struct { void* input_struct; void* output_struct; void* memory_struct; } model_state_t; model_state_t g_model_state = {0, 0, 0}; void tvm_prog_init_model(tvmgen_model_descriptor_t* descriptor) { model_state.input_struct = malloc(descriptor->input_struct_size_bytes); // ... } int32_t tvm_prog_set_input(const char* name, TVMValue value) { int index; descriptor->get_input_index(name, &index); descriptor->set_input(index, value); }
I’m not so much trying to propose we nail this down exactly here, just thinking aloud how we might do this. I think this approach would work and allow us to continue using the structs–this wouldn’t need to be emitted in all use cases, just when programmatic access was desired. let me know if this seems reasonable to you.
cc @stoa
Replies to previous post
Yip, I didn’t cover error handling in my example but the idea is just to propagate any returns from the AOT
run_func
back up the chain for the user to collect regardless of how that error is presented internally in TVM? If we need to check the actual error you can then use the other error functions?
That should work–generally, as long as we’re returning the exact int32_t
error code up, I think it should be sufficient.
Are you ok with deciding the exact level of internal organisation required at the code review stage for each incremental addition?
(if we are still using TVMContext, not sure now) I think so–I think it will make more sense as we are adding things.
I’ve used names for the inputs, outputs and workspaces, it’s worth considering whether we want to provide the facility for opting out of that for single input/output models simplicity?
I don’t really favor this too much–it’s sort of an API shortcut that is hard to implement in C and, while it does make the getting-started flow a bit easier, it adds potential confusion later on.
I’m assuming we’d also want to include this in the model header file we’d generate which would make it slightly more than just metadata, but I would prefer that over several headers for a model.
I agree generating a single header file makes sense to me. #include
a header should not modify the program size.