Several questions about micro TVM

sho · November 26, 2021, 10:59am

I’m working for a MCU vendor, and thinking about using micro tvm to deploy some ML models on MCUs. However, I’m a bit confused with the concepts of standalone execution.

We, as one of MCU vendors, would like to see standalone execution solely on MCUs using micro tvm. The image below is the concept of standalone execution found in the official doc.

211125f

Here I have some questions regarding this.

Where is ‘micro tvm’ in this image? I’m not sure what ‘micro tvm’ exactly means, and how it’s involved in the process of generating binary for MCUs. If ‘micro tvm’ means the mechanism or something to generate binary and doesn’t mean the code or binary that will be stored in MCUs, then what is the one people call ‘micro tvm runtime’?
Also I’m a bit confused with the difference between tvm and micro tvm. tvm can also generate the same components (Simplified Parameters, Compiled Operators and Graph JSON) right? I’d like to know the difference of the artifacts between by tvm and micro tvm.
According to the presentation below: TVM Community Meetup - AOT - Google Slides the difference between compilation with AOT and without AOT is the generated C code. To my understanding, AOT enables us to remove Graph Runtime and Graph JSON from the lists stored in MCUs, and instead another kind of C code will be generated which kind of combine compiled operators, Graph Runtime and Graph JSON. All we need to run inference is this C code(Compiled Operators) , Simplified Parameters, Inputs, device initialization code and application code(those in main). We just download these into MCUs. Is my understanding correct?
Regarding the test file: tvm/tests/micro/zephyr/test_zephyr.py https://github.com/apache/tvm/tree/main/tests/micro/zephyr As the README suggests, I ran

$ cd tvm/apps/microtvm/
$ poetry lock && poetry install && poetry shell
$ cd tvm/tests/micro/zephyr
$ pytest test_zephyr.py --zephyr-board=qemu_x86

But I got errors like below.

ImportError while loading conftest '/home/ubuntu/workspace/tvm/conftest.py'.
../../../conftest.py:20: in <module>
    import tvm
../../../python/tvm/__init__.py:26: in <module>
    from ._ffi.base import TVMError, __version__
../../../python/tvm/_ffi/__init__.py:28: in <module>
    from .base import register_error
../../../python/tvm/_ffi/base.py:71: in <module>
    _LIB, _LIB_NAME = _load_lib()
../../../python/tvm/_ffi/base.py:51: in _load_lib
    lib_path = libinfo.find_lib_path()
../../../python/tvm/_ffi/libinfo.py:146: in find_lib_path
    raise RuntimeError(message)
E   RuntimeError: Cannot find the files.
E   List of candidates:
E   /home/ubuntu/.cache/pypoetry/virtualenvs/microtvm-vg_j_zxI-py3.8/bin/libtvm.so
E   /home/ubuntu/.poetry/bin/libtvm.so
E   /usr/local/sbin/libtvm.so
E   /usr/local/bin/libtvm.so
E   /usr/sbin/libtvm.so
E   /usr/bin/libtvm.so
E   /usr/sbin/libtvm.so
E   /usr/bin/libtvm.so
E   /usr/games/libtvm.so
E   /usr/local/games/libtvm.so
E   /snap/bin/libtvm.so
E   /home/ubuntu/workspace/tvm/python/tvm/libtvm.so
E   /home/ubuntu/workspace/libtvm.so
E   /home/ubuntu/.cache/pypoetry/virtualenvs/microtvm-vg_j_zxI-py3.8/bin/libtvm_runtime.so
E   /home/ubuntu/.poetry/bin/libtvm_runtime.so
E   /usr/local/sbin/libtvm_runtime.so
E   /usr/local/bin/libtvm_runtime.so
E   /usr/sbin/libtvm_runtime.so
E   /usr/bin/libtvm_runtime.so
E   /usr/sbin/libtvm_runtime.so
E   /usr/bin/libtvm_runtime.so
E   /usr/games/libtvm_runtime.so
E   /usr/local/games/libtvm_runtime.so
E   /snap/bin/libtvm_runtime.so
E   /home/ubuntu/workspace/tvm/python/tvm/libtvm_runtime.so
E   /home/ubuntu/workspace/libtvm_runtime.so

After successfully running the script, am I supposed to get all the C code to build and download into MCUs?

Another script for AOT tvm/python/tests/relay/aot/test_crt_aot.py (https://github.com/apache/tvm/blob/main/tests/python/relay/aot/test_crt_aot.py)

There is no information on the environments so I didn’t even run the script. But this script generates the C code that can be compiled by let’s say arm v7? At this moment, can we generate C code by AOT, make and customize some application using the generated C code? In case of TFL micro, we can create a project in IDEs, pull some C code from TFL micro repository and include them in the project(C runtime), convert some tflite models into a large C array, TFL micro loads this array before running inference. This is how we can run inference using TFL micro and there is no need for MCUs connected to the host PC. I just want to do the same thing by AOT. Is it possible for the moment, and is there any sample of it?

leandron · November 26, 2021, 1:26pm

cc @areusch, @gromero, @grant-arm for visibility

sho · November 27, 2021, 12:09pm

Thank you leandron, for letting the developers know this question!

areusch · November 29, 2021, 6:39pm

hi @sho apologies for the delay as we were out on vacation for an extended weekend.

some answers to your questions:

Great question. microTVM is the name of the project within TVM to run code on bare-metal platforms. It’s composed of a few different pieces, some of which are shown in that picture above.

microTVM uses the TVM C runtime, which isn’t directly called out in the picture you sent, though the Graph Runtime mentioned there (now called GraphExecutor) refers to a graph-level model driver built for the C runtime.
you could produce identical code for the compiled operators with either a microTVM target or a traditional e.g. full-linux TVM targets. that is to say, the difference in the Target is essentially specifying to use the C runtime, and that’s orthogonal to the generated operator impl. now, it’s not possible to actually run the compiled operators on most of the microTVM targets using the C++ runtime used with traditional TVM targets. However, here I’m noting that microTVM shares the same compilation pipeline used by most other applications of TVM.

hopefully my last answer clarifies this. also note that this is a bit dated–for TVM as a whole, we are building an AOT Executor which requires no JSON parsing. the first application of this is microTVM, and there are tests checked-in to demonstrate this. we’ll be expanding this executor’s capabilities in the coming weeks/months.

that’s correct

before you can run this example, you need to first build TVM. either:

follow the microTVM reference VM instructions to build a Virtual Machine which contains all of the dependencies you need to work with microTVM. this is heavyweight, but it is also our reference platform, so it is expected to work out of the box.
if you already have Zephyr installed, you don’t need to do this, but per your error you do need to build TVM.

once you’ve built TVM, it will be able to proceed past that error; either path above will work. you should then expect it to build/flash/run code on the micro device.

this script is intended as a regression test to ensure the AOT remains functional as we develop it. we have some examples as to how to deploy AOT, but they are limited right now. see my next response.

kind of. this script does target Corstone-300, which is a Fixed Virtual Platform intended to help verify Cortex-M profile code. so, it does in fact generate code which could run on microcontrollers, but it doesn’t explicitly target them as the focus for that script is our Continuous Integration.

to launch code on device, our preferred infrastructure is the Project API. there are a couple integrations so far (arduino and zephyr here). Project API is centered around taking a TVM compiler artifact in Model Library Format and integrating it into a template e.g. IDE project so that it can be built/flashed either by a user or by TVM (to run host-driven inference or autotuning; but you can still flash for standalone purposes depending on how the template project is setup).

for AOT, we have a demo using Project API. for the Zephyr template project, the project_type ProjectOption is used to make this selection; you could try adapting the TFLite tutorial here.

however, a final word about AOT–we haven’t yet pushed on a comprehensive tutorial to demonstrate deployment onto the device due to a couple of in-progress efforts:

USMP, which will perform static memory planning for all tensors in the graph and therefore dramatically reduce the footprint of models.
Embedded C runtime interface, which has just nearly landed in full, but we haven’t had time yet to put together a formal demo yet. This is a microcontroller-friendly interface to TVM-compiled code.

At this point we are getting quite close to TVMcon, so I am hoping for better tutorials/examples in Q1 2022. however, I would say that we hope to demonstrate some of this at TVMcon, and I’d suggest looking out for the Arduino tutorial which should cover some of these aspects.

sho · December 6, 2021, 1:26am

Hi @areusch, thank you very much for your elaborate explanation, and sorry for my late confirmation. I hope your thanksgiving was wonderful!

So the Graph Runtime works on top of C Runtime? Could you please tell me where the C Runtime actually is? I found the link below but it seems that graph_executor is written in C++.

I understood that the key difference is micro TVM uses C runtime and micro TVM uses C++ runtime, and the latter may not work on MCUs for the moment.

AOT Executor is the C code that can be downloaded onto MCUs? I’d like to know what capabilities you are adding to AOT. Also, I’m curious about how you can execute GraphExecutor + GraphJSON on MCUs for the moment. I know it’s difficult for MCUs due to memory constraints, but I’d like to see how you deploy standalone GraphExecutor + GraphJSON to run inference on MCUs without communicating with the host PC. Should I be able to see it in the tutorial below?

Sorry while you’re busy but your answer really helps me understand what is available and what is not in micro TVM.

areusch · December 7, 2021, 12:00am

There are actually two GraphExecutor implementations in TVM: one for the C++ Runtime and one for the C Runtime. Sorry for this confusion.

could you clarify this sentence? I think you meant to say something different in the second “micro TVM.”

Yes.

We’ll be releasing some roadmaps for microTVM and TVM as a whole ~next month.

Yes, the microtvm-blogpost-eval does explain the approach, but unfortunately it’s a bit out of date. We should have a new tutorial at some point soon that uses Project API. I’d also suggest you look for the Arduino tutorial coming up at TVMCon. Apologies for lacking documentation here.

sho · December 7, 2021, 3:36am

Thank you very much for your answers.

Oh, ok. I should have noticed that GraphExecutor written in C is in crt. Looking though the contents of tvm/src/runtime/crt, I can create a C project on IDE with the source code here and the outputs from tvm.relay.build + SoC(or MCU) initialization code right?

Apologies for my mistakes and confusing you. I wanted to say

“I understood that the key difference is micro TVM uses C runtime and TVM uses C++ runtime, and the latter may not work on MCUs for the moment.”

But this might not always be the case since as for Host-driven execution, which requires rpc server/client communication, the scripts for this are in

and they are written in C++ right?

I think before I dive into micro TVM, I should begin with TVM so that I will have clear understanding of the difference between TVM and micro TVM.

Thank you for your release information. I’m looking forward to its release.

Ok, that’s fine as long as it works anyway. Also I’ll try to run the Arduino tutorial when it’s published. Thank you for your suggestion.

areusch · January 3, 2022, 1:40am

Sorry for the long delay!

Yeah the embedding of CRT can be a little confusing. We should probably think about whether we should reorganize src/runtime at some point. Yes, you should be able to create a C project in your IDE as you described.

Yeah you’re right here. We need to do a rewrite there into C, but we need to invest more heavily in C-based RPC unit testing first.