Hi, I have a simple question: how much work time would it take for a single Ph.D. student to bring TVM to a new CNN accelerator (RISC-V controller + accelerator)?
Thanks in advance
Hi, I have a simple question: how much work time would it take for a single Ph.D. student to bring TVM to a new CNN accelerator (RISC-V controller + accelerator)?
Thanks in advance
Hi @areusch ,
There is no operating system on the RISC-V controller.
What is the difference between bare-metal and OS on the controller?
hi @Julien,
Got it. TVM has two runtimes: a C++ runtime (used when OS present and referenced in many of our tutorials) and a C runtime (used when OS not present; referenced in our microTVM tutorials).
Support for this arrangement isn’t complete yet but would fall under microTVM. See the µTVM M2 roadmap item 11. However, people have used TVM with accelerators and on ARM microcontrollers, so there are essentially two parts to this:
There are also some additional work items which we would like to get to at some point to make this process more streamlined and implement runtime optimization such as async execution. However, those aren’t strictly required to get things working.
Let me know if this answers your questions! Andrew
@areusch for step 2, if you also wanted to use TVM for the accelerator codegen as well, would VTA provide a guide there? And would you still use BYOC there (just calling out to TVM in BYOC for compilation as well)?
Hi @areusch , Thanks a lot for your help!
I still have a question about the time needed: in your opinion, is it more a 1-month project or a 1-year project for a single person?
Hi @Julien
Seems like we are working on a similar problem! We have been actively trying to port TVM to our own accelerated RISC-V microcontroller. Check out: Feedback on TVM port to custom Accelerator I’m also very interested in @areusch 's feedback on our post.
@jknight we currently do the codegen to the accelerator through C code with the help of an inhouse library and a lot of tensorization in TE. We don’t know if this approach is the best way to go about it though. Right now a lot of accelerator details are abstracted away in the library in a suboptimal way, just to get something working in a straightforward (ver 0) way with TVM.
I currently think the biggest hurdle for us is that some hardware accelerator designs choices rely on very specific optimizations or mapping strategies from a compiler that are not always readily available. But these HW design decisions can often make or break performance. So in my experience (at this moment) the time it takes for porting TVM heavily relies on what software is already available for your accelerator, the accelerator design itself etc etc .
hi @Julien ,
With the help of @areusch , I have tried to make tvm/tests/micro/qemu
work on qemu_riscv32
, and one month ago, it works for test_compile_runtime
and test_relay
in test_zephyr.py
, here is my code. But I find tvm code has changed right now including tvm/tests/micro
, so I have not verified it on current version, wish it could give you some help.
hi @dream-math @jossevandelm @Julien
Thanks for some great collaboration, seems like there is significant community interest in merging RISC-V support. Let’s lay out some steps we can follow to make this happen, and then we can discuss timelines.
To start with, I propose we explicitly test the Zephyr riscv32_qemu board in TVM CI. This is the easiest integration we could make, since we have so much Zephyr support in microTVM now. @mehrdadh is actually working on this right now, on top of the recently-merged PR 7557 (which reorganizes the various Zephyr runtimes in tvm such that we should be able to have a single runtime compatible with all boards–a precursor to Project Generator API).
I think it’s likely that some of you would like to try using RISC-V on platforms other than Zephyr. Though it’s possible to proceed with this at main
today (by implementing a Compiler
, Flasher
, Transport
etc), I’d suggest working on top of the Project Generator API PoC if you’ll be starting on this in the medium term (e.g. after maybe a week from now). In general, this approach should scale better as TVM integrates with more platforms or with bare metal.
Autotuning support is in this PR which needs syncing. I’m hoping to do that after merging the Project Generator API (as autotuning added another dependency). After that lands, it would be great to explore some schedules on RISC-V.
Currently, the easiest way to add an accelerator is to expose a CPU-side control function to microTVM as a PackedFunc or C function, and then embed that in the TIR (I.e. TVM’s pre-codegen representation of your program) as tir.call_packed
or tir.call_extern
. Doing this is effectively using the BYOC or Tensorization flows.
In the future, we should consider enhancing the interface between microTVM and accelerators. In the C++ runtime, a Device API exists. It’s probably overkill for microTVM, but it’s likely that what we have now could be made more efficient. In particular, a richer accelerator interface could allow us to pursue graph-level optimizations that may help to offload some of the implementation work around layout transformations and data copy. Let’s discuss this further on @JosseVanDelm’s post as there is quite a bit more detail there.
Now as to timelines (but these are estimates, don’t hold me to them ):
-Andrew