[RFC] Standalone Code Generation and C Runtime for STM32 bare-metal devices

@stoa @delorme-jm

Apologies for being unclear earlier, let me try to clarify.

The inputs to our code generator do not create a problem: I have alredy experimented with the Model Library Format. The problem that I see is that the code generator itself needs to be placed together with the application project.

This is where I should clarify–to me, Model Library Format and Project API are two different things:

  • Model Library Format (merged to main) specifies the layout of a .tar file that contains various parts of the compiled model. You generate Model Library Format with tvm.micro.export_model_library_format and it’s meant for either a) debugging or b) to be consumed by downstream project generators. Such downstream project generators will likely eventually mostly be Project API implementations, but this is not required.

  • Project API (not yet merged and still rough around the edges, as you rightly assessed) is an abstraction layer that puts typical platform-specific microTVM build tasks behind an API. Those are tasks like generate_project, build, flash, connect_rpc_server. Project API implementations would be typically co-located with application code (this is just a suggested convention, though, it doesn’t have to stick). Project API enables two different workflows:

    1. standalone project generation for either deployment, experimentation, or measurement (this is similar to the purposes stated in the X-Cube generator UM2526 doc).
    2. autotuning

So it seems to me that a good path forward (while we wait for e.g. AOT, memory planning, and Project API to get merged into main proper) would be to keep your code-generator in a Python script in the TVM repo. I’d suggest you consider having your script consume Model Library Format (which you can generate today at main with tvm.micro.export_model_library_format, rather than directly calling the TVM APIs.

This approach is roughly the same as what you’ve proposed in your PR, with the change that it would consume Model Library Format rather than the output of e.g. tvm.relay.build directly. If you need something more in Model Library Format, let’s just add it, because someone else will likely want it.

I think the main benefits of this are:

  • it moves your implementation further away from the core APIs, in case they drift
  • it benefits Model Library Format, as it would help to identify any shortcomings with the format (e.g. if it’s missing something you’d like, I think we should just add it).
  • if you decide to use the generic microTVM autotuning driver (e.g. from PR 7545) later on, you’ll need to make some Project API impl (even if it just shells out to X-Cube to do the actual generation). your Project API impl will receive the model in Model Library Format. So, this should help simplify this effort, as by this point you’d already have a project generator to start from which takes the same input given to you in autotuning.
  • finally, as we move on to Piece #2 (reworking C APIs to align with your X-Cube APIs), I suspect that having the same data available to all project generators will make that task easier to accomplish.

I think you could either place your code-generator in apps/microtvm/stm32 or in python/tvm/micro/contrib/stm32, even though it won’t look anything like python/tvm/micro/contrib/zephyr.py (we’ll move that into a Project API impl in apps/microtvm/zephyr shortly).

Yes, we intend to use the AutoTuning. We have not looked at it closely, yet. I had made it work in our environment with your old microTVM - the host driven AutoTuning. That worked well, by the way. I am speculating here but we may not support user AutoTuning in the CubeAI - we probably will opt for building our AotuTuning database and make it accessible to the TVM via a git repository.

Glad to hear this worked well. I think I’m also unsure as to whether autotuning would be an SoC vendor thing or an end-user thing. I’d still like to improve the autotuning infrastructure to make it easier to use–that benefits everyone. And, I think there could be potential situations where an end-user may want to try it, although I don’t have any specific known cases yet.

Basically, yes - it allows the application to not hardcode the quantization information but get it from the model.

Thanks for this clarification, that’s really helpful!

Let me know if this makes sense!

-Andrew