[RFC] Standalone Code Generation and C Runtime for STM32 bare-metal devices

stoa · April 8, 2021, 12:50pm

Hello, Andrew

We are still discussing the way forward here internally. I do not think I understand how you propose to integrate our work at this point. Below, once more my understanding, thanks for patience

I think there are two fairly separable pieces to your proposal here:

Adding a code-generator that produces models which implement the STM32 X-Cube AI API (e.g. ai_create, etc).

Reworking the TVM C Runtime APIs to more closely match the STM32 X-Cube API (which matches more closely to APIs from other embedded deployment tools–so therefore a direction in which microTVM should consider moving).

I think that piece #1 is fairly uncontroversial, and we’ve resolved the main challenges there (e.g. testing). Piece #2 will take longer, and more impacts the scope of the initial effort. Given the amount of development in progress now, it’ll be hard to settle on piece #2 until some of the core improvements (e.g. AOT, memory planning) land. So initially, let’s focus this RFC on merging piece #1.

That’s clear.

Along those lines, I wonder if we could take a middle-ground approach here: the Model Library Format piece is merged to main. Is it possible to modify your code-generator to consume Model Library Format rather than using internal TVM APIs directly? If needed, we could make changes to Model Library Format to accommodate this change (e.g. you’ll be the first non-TVM use of it, so it wouldn’t surprise me if some parts need tweaking).

The inputs to our code generator do not create a problem: I have alredy experimented with the Model Library Format. The problem that I see is that the code generator itself needs to be placed together with the application project. Precisely:

You are right:

it seems like you do have some project generation facility.

So, our ML development flow is:

                --------------------
  Model -->     |      CubeAI      |
                --------------------
                    |          |
                    V          V
CubeMX Project +  C Code  + runtime  => Target
                           Libraries

For example, the demo that we’ve developed is based on such CubeAI generated project.

Now we are working on integrating the TVM with our CubeAI generator. The input is the model, the output is the C API. Internally, from the CubeAI prospective, the input to the code generator may be anything TVM generates (whether RuntimeModuleFactory or Model Library Format) and it is not visible from the project. Thus, the microTVM project will need to install the CubeAI tools in order to build the model implementation. When either such CubeAI is available, or we move onto the AoT code generator, we can propose a demo project within the microTVM framework. Like I said earlier, we prefer not to make the codegenerator+runtime part of the application project at this time.

I could see how you prefer to keep project generation centralized within the larger STM X-Cube tool rather than invoking TVM via Project API. The one question that comes to mind is: do you intend to support autotuning efforts on-device? If so, at some point it’d be good to discuss a way forward to integrate the AutoTVM search tool with STM32 X-Cube project generation.

Yes, we intend to use the AutoTuning. We have not looked at it closely, yet. I had made it work in our environment with your old microTVM - the host driven AutoTuning. That worked well, by the way. I am speculating here but we may not support user AutoTuning in the CubeAI - we probably will opt for building our AotuTuning database and make it accessible to the TVM via a git repository. The details will be clear when the microTVM autotune is released.

Concerning the quantization info:

I’m curious what this is used for specifically–wouldn’t the application already know this e.g. in a hardcoded pre-processing function? Or does this allow the application to implement a generic pre-processing function?

Basically, yes - it allows the application to not hardcode the quantization information but get it from the model.

It allows generic preprocessing
The more information you can get from the model, the more robust is your code for different models.
During development, it is easier to maintain changing quantization parameters between the quantized model, the python environment, and the C code.

Of course, if you need a really specific pre- or post- processing, the main application needs to be specific. Even in these cases, the quntization does not need to be hardwired. Bottom line: if you make a mistake with model input shape, you may get an error message, while you will just get wrong results if you make a mistake with the quantization parameters.