TVM and custom accelerators

Julien · March 25, 2021, 8:21pm

Hi, I have a simple question: how much work time would it take for a single Ph.D. student to bring TVM to a new CNN accelerator (RISC-V controller + accelerator)?

Thanks in advance

areusch · March 25, 2021, 9:38pm

hi @Julien ,

Do you have an operating system on the RISC-V controller?

Thanks, Andrew

Julien · March 25, 2021, 10:25pm

Hi @areusch ,

There is no operating system on the RISC-V controller.

What is the difference between bare-metal and OS on the controller?

areusch · March 25, 2021, 10:58pm

hi @Julien,

Got it. TVM has two runtimes: a C++ runtime (used when OS present and referenced in many of our tutorials) and a C runtime (used when OS not present; referenced in our microTVM tutorials).

Support for this arrangement isn’t complete yet but would fall under microTVM. See the µTVM M2 roadmap item 11. However, people have used TVM with accelerators and on ARM microcontrollers, so there are essentially two parts to this:

Get the TVM C runtime working on RISC-V. I believe @dream-math was working on this.
Implement your accelerator in TVM using Bring-Your-Own-Codegen.

There are also some additional work items which we would like to get to at some point to make this process more streamlined and implement runtime optimization such as async execution. However, those aren’t strictly required to get things working.

Let me know if this answers your questions! Andrew

jknight · March 26, 2021, 12:13am

@areusch for step 2, if you also wanted to use TVM for the accelerator codegen as well, would VTA provide a guide there? And would you still use BYOC there (just calling out to TVM in BYOC for compilation as well)?

Julien · March 26, 2021, 2:15pm

Hi @areusch , Thanks a lot for your help!

I still have a question about the time needed: in your opinion, is it more a 1-month project or a 1-year project for a single person?

JosseVanDelm · March 27, 2021, 12:31pm

Hi @Julien

Seems like we are working on a similar problem! We have been actively trying to port TVM to our own accelerated RISC-V microcontroller. Check out: Feedback on TVM port to custom Accelerator I’m also very interested in @areusch 's feedback on our post.

@jknight we currently do the codegen to the accelerator through C code with the help of an inhouse library and a lot of tensorization in TE. We don’t know if this approach is the best way to go about it though. Right now a lot of accelerator details are abstracted away in the library in a suboptimal way, just to get something working in a straightforward (ver 0) way with TVM.

I currently think the biggest hurdle for us is that some hardware accelerator designs choices rely on very specific optimizations or mapping strategies from a compiler that are not always readily available. But these HW design decisions can often make or break performance. So in my experience (at this moment) the time it takes for porting TVM heavily relies on what software is already available for your accelerator, the accelerator design itself etc etc .

Dream-math · March 29, 2021, 2:40pm

hi @Julien , With the help of @areusch , I have tried to make tvm/tests/micro/qemu work on qemu_riscv32, and one month ago, it works for test_compile_runtime and test_relay in test_zephyr.py, here is my code. But I find tvm code has changed right now including tvm/tests/micro, so I have not verified it on current version, wish it could give you some help.

areusch · March 29, 2021, 5:15pm

hi @dream-math @jossevandelm @Julien

Thanks for some great collaboration, seems like there is significant community interest in merging RISC-V support. Let’s lay out some steps we can follow to make this happen, and then we can discuss timelines.

To start with, I propose we explicitly test the Zephyr riscv32_qemu board in TVM CI. This is the easiest integration we could make, since we have so much Zephyr support in microTVM now. @mehrdadh is actually working on this right now, on top of the recently-merged PR 7557 (which reorganizes the various Zephyr runtimes in tvm such that we should be able to have a single runtime compatible with all boards–a precursor to Project Generator API).
I think it’s likely that some of you would like to try using RISC-V on platforms other than Zephyr. Though it’s possible to proceed with this at main today (by implementing a Compiler, Flasher, Transport etc), I’d suggest working on top of the Project Generator API PoC if you’ll be starting on this in the medium term (e.g. after maybe a week from now). In general, this approach should scale better as TVM integrates with more platforms or with bare metal.
Autotuning support is in this PR which needs syncing. I’m hoping to do that after merging the Project Generator API (as autotuning added another dependency). After that lands, it would be great to explore some schedules on RISC-V.
Currently, the easiest way to add an accelerator is to expose a CPU-side control function to microTVM as a PackedFunc or C function, and then embed that in the TIR (I.e. TVM’s pre-codegen representation of your program) as tir.call_packed or tir.call_extern. Doing this is effectively using the BYOC or Tensorization flows.
In the future, we should consider enhancing the interface between microTVM and accelerators. In the C++ runtime, a Device API exists. It’s probably overkill for microTVM, but it’s likely that what we have now could be made more efficient. In particular, a richer accelerator interface could allow us to pursue graph-level optimizations that may help to offload some of the implementation work around layout transformations and data copy. Let’s discuss this further on @JosseVanDelm’s post as there is quite a bit more detail there.

Now as to timelines (but these are estimates, don’t hold me to them ):

I think we’ll try to have a PoC/PR for Zephyr riscv32 support in a few weeks. I believe we need to update to Zephyr 2.5.0 for this.
I think we’ll try to merge Project API in the next few weeks. Work is being done now to port the Zephyr runtime to work under Project API.
Work to bring up an accelerator as in (4) would take I’d say maybe 2-3 months, but there may be also a month or two of ramp-up work involved with that. I may defer to @JosseVanDelm here. We’d like to improve this, so any specific feedback on confusing or time-consuming things would be great!

-Andrew