[RFC] [uTVM] AOT optimisations for Embedded Targets

I agree with you wholeheartedly, we need to be careful with naming here; based on your comments it might be good to pick something like --packed-functions which defaults to running MakePackedAPI ? Maybe --packed-internal-functions or --packed-operator-functions ?

How about --unpacked-api? I know best practices often suggest to avoid naming booleans with a negated prefix, but in this case we have a good reason to: because we’re abusing Target. Also, “unpack” is a common word and seems to be what we’re using internally (call_unpacked). I’d mainly like to ensure the option doesn’t default to true, as that will break autotuning.

Given the micro entrypoint is a C2 function, I’d propose we don’t add a C0 behaviour to the option. Is the use-case here to provide the C2 entrypoint for the application and also providing a C0 wrapper for operators in the same bundle with a different path in the application to allow calling them directly?

The idea here is to enable the RPC server to call the AOT executor function. Doing this allows someone to write a single generic firmware entry point, couple it to an arbitrary compiled model, and then drive inference from Python. It’s not a deployment use case, but may speed prototyping; it’s also the same workflow we would use to support autotuning.

I thought on this a bit and potentially we should change the name of --micro-entrypoint to just --entrypoint and default it to packed which generates the packed function interface for module loading, a better name may be module to reflect the entrypoints purpose? This should provide us with the ability to name the relevant interface.

Or even --model-entrypoint-api or --entrypoint-api might be good.

As for implementing this in TIR, I believe there’s a few limitations in how much TIR understands struct s which we’re proposing in [RFC] [uTVM] Embedded C Runtime Interface. Though I think we’re aligned in the view this should be done via code generation rather than this initial solution - I did have an implementation that filtered the passes based on the entrypoint function originally so I don’t think this is too hard to achieve.

Agreed–there was some user-defined data type work being done. cc @ziheng if he could give a status update.

After the initial deployment of a model, integrating the whole standalone_crt may be necessary, at which point we shouldn’t prohibit that, it just takes a bit more understanding of the pieces you might need? In which case, my hope is the embedded interface allows that use case of deploying with a few headers and then expanding when your application demands it.

I agree you shouldn’t need to include everything in src/. The sub-directories there should provide some split points, and we should provide some guidance in documentation as to which are needed for which use cases. The main point is: I’m not sure it’s super-necessary to split out the include directory in a similar fashion. They shouldn’t add any bloat to the firmware. Let me know your thoughts on this.

Ideally I’d like to land the micro entrypoint as a separate PR once 8023 lands

Yep sounds good!