[μTVM] Deployment on GAP8 RISC-V platform

Hello everyone,

For my master’s thesis I’m doing research on deep learning compiler toolchains so future deep learning accelerators can be utilized to the fullest and can be easily integrated in e.g. IoT and ultra-low-power applications.

I’m currently trying to look into how easy it is to link big compiler toolchains (like tvm) to an accelerator platform that is currently being developed at my university. Currently only some very optimized yet very unportable solutions exist in this space. So coupling TVM to this platform would be highly beneficial.

Part of the platform being developed will be similar to Greenwaves Technologies’ GAP8 RISC-V platform. That’s why i’m currently trying to deploy uTVM on a GAPuino development board to see what’s already possible with uTVM. Like the platform being developed, the GAP8 is quite constrained in memory and does not allow for a big OS to be loaded. So as it is a true bare-metal device, uTVM seems a great fit.

However right now I’m struggling on where to start for this deployment on the GAP8. I’ve read the blogpost; I have an OpenOCD and an adapted GCC compiler in place for the platform . The problem is that now I need to do parts 3 and 4 that are written on the blogpost:

  • a specification containing the device’s memory layout and general architectural characteristics
  • a code snippet that prepares the device for function execution

I’ve checked out the tutorial video and the code from the blogpost. But it seems that here and there a few things have shifted in the code since then, as I was unable to load the tvm.micro.device module. This is also reported in this forum post EDIT: I also just found this forum post. Also the tutorials provided on the docs concerning uTVM don’t exactly mention how to do this step 3 and 4 of the blogpost.

So I was wondering If i could get some help on deploying uTVM on the GAPuino board. Is there a guide for deploying on new platforms somewhere that I have missed? Or maybe this deployment on a GAP8 has already been done? If anyone has some pointers, examples or experience with this that would be great!

Thank you very much!

1 Like

Hi @JosseVanDelm! Thanks for your post. This is indeed a very interesting direction to take TVM/microTVM.

Since made the blog post we’ve been working to improve µTVM portability, and have made significant changes to the way µTVM launches code. See µTVM roadmap. We are just about finished with that.

I’m actually currently working on syncing the microtvm-blogpost-eval repo up to work with main. I wish I could give you simple instructions but the changes are fairly complex. I should have that finished this week, though some PRs needed are not yet merged into TVM.

To get GAPduino working, you’d need to be able to compile µTVM RPC server and run it using a UART. We have code that uses Zephyr to do this on a variety of targets (and some RISC-V, though not yet tested), but I don’t know that Zephyr supports GAPduino. You could take a look at how we did the Zephyr integration.

We don’t yet have good documentation for porting µTVM to new platforms. This is another thing I’m working on and hope to address soon. I’d point you at two things that show the new flow to see if you can get started:

  1. zephyr_test, which shows a minimal example of testing a single µTVM function on a Zephyr board
  2. test_crt which exercises just the RPC server using stdio as a UART replacement.

apologies if we are a bit light on documentation. feel free to ask more questions and i’ll try and answer as best I can, and i’ll let you know when documentation is improved (in the next few weeks, I think).

Andrew

1 Like

i’d also point you at the work from NTHU to enable RISC-V P support. it’s not merged now, but it may be a useful reference as well.

1 Like

Hi @areusch,

Thanks for your ellaborate reply. I’m a bit overwhelmed by all the changes actually. I had bumped into the roadmap you mentioned but i found it fairly difficult to comprehend with my little background and i did not expect uTVM to have changed so radically in so few months time already :sweat_smile:.

Having gone through the roadmap I have the following questions:

  • Do I understand correctly that you are trying to replace the simple openocd interface that needs the read/write/execute functionality with a C runtime and a minimal RPC server that connects through UART? Could you maybe ellaborate on the changes there? Why is this necessary? I suppose to benefit even more from what is already realized in the rest of the TVM stack?
  • Zephyr does not support GAP8. To be honest I’m not sure what such a low level OS actually provides. Do I need an OS with the current changes? Would this facilitate deployment? I’ve seen MBED-os being mentioned on both uTVM and GAP8 sides. Could this be an interesting approach?
  • With the current proposed changes, isn’t the overhead of running tvm on the device much higher than previously? How do i know an RPC server, a Runtime and an OS leave enough headspace for deploying useful neural networks on the device?
  • Yesterday I tried to go through the zephyr demo with a debugger, but the dependencies of the test were quite big and difficult to install on my machine. Do you have a proposed debugging strategy maybe? Maybe it’s easiest if I run it inside the CI docker? Or would that be difficult? Sadly I have no experience with this myself.

I’m sorry if I sound a bit sceptical. It’s just that, having read the earlier blog post, I thought that I could integrate uTVM in a couple of days and then automatically benefit of all the rest of the compiler stack. I have an intermediate presentation due in a couple of weeks and frankly I’m not so sure anymore if it’s worthwhile spending a lot of time on getting this to work for my thesis. Maybe you have an idea of how much work/time it would take me :sweat_smile: ?

Thank you very much for the great help you are providing me here! I really do appreciate the work everybody and especially you are doing so keep up the good work! I’m very curious to see where TVM and especially microTVM is going!

hi @JosseVanDelm,

I agree there have been quite a few changes since the last blog post. We’ll give an updated overview at TVMconf in a couple of weeks’ time.

Do you need to run autotuning to start with, or just run inference? If the latter, you definitely don’t need to bother with any of the OS–I would just try to build with the c target and link the generated code and graph runtime into a binary for your platform. you could follow the build steps in test_crt.py and then export the generated code with mod.export_library() to produce a C file you can compile for your target.

From a time perspective–how practical is it to set up a UART or semihosting connection on your development board? The µTVM code is a bit new right now, so while we don’t want efforts like this to take long, we don’t have documentation sorted just yet for this. Happy to answer questions if you want to pursue this path.

i’ve included some more detailed answers to your questions below.

Andrew

  • Do I understand correctly that you are trying to replace the simple openocd interface that needs the read/write/execute functionality with a C runtime and a minimal RPC server that connects through UART? Could you maybe ellaborate on the changes there? Why is this necessary? I suppose to benefit even more from what is already realized in the rest of the TVM stack?

The main driver behind these changes is actually portability for autotuning. None of these changes affect the deployment requirements–µTVM does not assume the presence of an Operating System, and the runtime it requires are more like support functions around e.g. memory allocation, error reporting, etc (the TVMPlatform functions are the chip-dependent ones).

However, autotuning assumes that the target environment performs the same between runs, and on a bare-metal platform, the only reasonable way to do this is to fully control the set of instructions that execute between SoC reset and model execution. A major limitation of the previous approach was that you’d get different absolute timing numbers depending on which program was loaded in flash.

So to allow reproducible autotuning in a way that’s friendly to first-time users, we needed to choose a portable approach. This is why we’ve introduced the RPC server + Zephyr support. Now, it should be noted that we aren’t requiring you to use Zephyr–we want to make it possible to easily build the RPC server into whichever runtime environment you choose–just, in that case, you need to provide an implementation of the Compiler and Flasher classes.

  • Zephyr does not support GAP8. To be honest I’m not sure what such a low level OS actually provides. Do I need an OS with the current changes? Would this facilitate deployment? I’ve seen MBED-os being mentioned on both uTVM and GAP8 sides. Could this be an interesting approach?

You don’t need an OS, strictly speaking–you just need a small main() that can configure the SoC and launch the RPC server (for autotuning) or the graph runtime (for runtime inference). You’ll link different µTVM libraries into each binary (i.e. you’ll also link the RPC server library when autotuning). I have an mBED implementation of Compiler and Flasher here you could try, though it needs to be sync’d to main. This could be a good route for you if mBED is well-supported on that board. Or if it’s easier for you to write UART send/receive functions you could just do without an OS.

  • With the current proposed changes, isn’t the overhead of running tvm on the device much higher than previously? How do i know an RPC server, a Runtime and an OS leave enough headspace for deploying useful neural networks on the device?

There is an increase in the code overhead and a small increase in memory consumption for autotuning specifically. For deployment, the RPC server isn’t needed, and the OS would be whatever your project needs (if any), so we don’t see a large overhead there. For autotuning, you are typically loading just one operator at a time, so we think the impact should be limited.

  • Yesterday I tried to go through the zephyr demo with a debugger, but the dependencies of the test were quite big and difficult to install on my machine. Do you have a proposed debugging strategy maybe? Maybe it’s easiest if I run it inside the CI docker? Or would that be difficult? Sadly I have no experience with this myself.

We have a “Reference VM” that we just need to build and upload, and a tutorial that should be published but is missing like a Sphinx directive to stick it into the correct place in the doc tree. The VM contains all of the Zephyr deps you need, and is a little better way to do this than Docker since USB forwarding with Docker only works with libusb devices. You can try to build these boxes yourself using apps/microtvm/reference-vm/base-box-tool.py if you don’t want to wait on me to upload them.