Coordination of RISC-V Integration in TVM

A few words about myself

I am a PhD student and have been working with TVM for over a year. My main field of research is TinyML and RISC-V based microcontrollers. Thus, I work on supporting bare metal 32-bit RISC-V targets (Actual hardware and simulators) using the MicroTVM Platform.

Background / Motivation

Over the past years, there has been a lot of interest in supporting RISC-V hardware in the TVM Compiler Suite. While some attempts have been made in the past, nothing has ended up upstreamed in the TVM repository. As several teams/individuals seem to be working in the same direction at the moment, it would make sense to align these efforts to reduce the risk of duplicate work and agree on some relevant design decisions beforehand.

What is currently supported?

While RISC-V is not yet officially supported by TVM, it can already be used using the default LLVM-based flow or the MicroTVM APIs (via the RISC-V QEMU running on the Zephyr Platform). However, the generated code (falling back to default or ARM schedules) is not yet optimized for the RISC-V ISA. Thus performance will not be optimal.

What is required?

In my opinion, the following baseline tasks:

  • Integrate support for RISC-V ISA simulation (ISS) in TVMs CI scripts and unit tests.
  • Decide how to detect the available RISC-V ISA extensions. Align this with the work done for ARM Cortex-M MCUs?
  • Add RISC-V specific schedules (default, packed (sub-word SIMD), and vector (super-word SIMD) in TVM: either starting with generating schedules or based on the current arm_cpu schedules. (See tracking issue: https://github.com/apache/tvm/issues/10141)
  • We also eventually need type legalizations, alter_op_layout, and the operator strategies
  • Generate AutoTVM logs for relevant models on RISC-V hardware and add them to the top hub (https://github.com/tlc-pack/tophub) repository. This should help to get better default performance for some use cases.

Past & Ongoing work

As it is quite hard to follow all the attempts to integrate RISC-V related features into TVM Iā€˜ve tried to compile a list of efforts in that direction. Please let me know if I missed something relevant.

7 Likes

CC @areusch @masahi @comaniac @Julien @Dream-math @Tonylyc @UCASHurui

Thanks @PhilippvK for this summary! I agree it would be quite compelling to expand TVMā€™s capabilities to target the RISC-V ISA.

I think this might be one of the first steps we need to pursue in order to make it easier for folks to upstream their work. A minimal demo of this might look like the following:

  1. A docker/install/ubuntu_install_spike.sh script to install SPIKE (or another RISC-V sim of choice) into our CI docker containers.
  2. Add SPIKE to a new ci_riscv container by creating a new docker/Dockerfile.ci_riscv. Ensure the compiler in this container supports the RISC-V extensions weā€™d like to target.
  3. A demonstration Project API which can be used to run simple tests on SPIKE. It should be possible today to just create a Project API implementation and re-target a simple add test e.g. test_aot_executor at the simulator (e.g. via just changing the path to the template project in _make_session.

Then, we can start tackling the question of implementing schedules or BYOC flows for specific CPUs more easily. Whatā€™re your thoughts here?

The main issues is that spike does not provide any prebuild binaries. Would it be an big issue to compile the simulator from source inside the CI? Regarding other simulators: OVPsimPlus is pretty good, however you have to sign up up there homepage to get a download link.

Let me introduce the main issues here:

  • In the long term we want to support especially the P-Extension (packed, sub-word SIMD) and the V-Extension (Vector, super-word SIMD)

  • While the spec of the V extension is frozen since the end of last year the P extension is still evolving

  • Toolchain support:

    • GCC: nothing made it into the main branch yet, there are separate WIP branches/PRs working on those extensions (Both are usable but not at the same time and of course they need to be compiled from source)

    • LLVM: support for the latest vector extension (RVV) is available in LLVM 14, while the integration of the P extension (RVP) is still a work-in-progress. Thus we can not use LLVM to build programs with those instructions. Furthermore, for linkage we always need a GCC toolchain as the libc support seems to be missing in RISC-V LLVM.

  • We have to decide how we want to solve those issues? There are some approaches:

    • Ignore LLVM for now and just use the GCC (two separate builds for RVP and RVV would be required here). This would probably we good enough for MicroTVM targets which anyway use GCC most of the time. LLVM support would mainly be interesting for the vectorization feature which doesnā€™t work anyway.
    • Just stick with LLVM which at least has stable RVV support but drop opportunities to integrate the RVP in TVM until it is supported by LLVM (We still need a basic version of the GCC RISC-V toolchain for linkage, however this could be just downloaded in the CI).
    • Mixed approach: Use LLVM for RVV and GCC for RVP. This way we would only need a single RISC-V GCC build.
  • @areusch What are your thoughts about the build process for the toolchain. It might be quite time-consuming when building the docker images. Would it make more sense to host a rebuilt version of the toolchains somewhere for downloading it from the CI?

Running spike though the MicroTVM Project API was easier than I expected. I will test it for a few more days and can open a PR for this in the future.

Update: We should also probably align with the work being done here by @alter-xp as this also will likely require a RISC-V GCC toolchain variant with vector support (Unfortunately the C906 only support an older version of the spec).

1 Like

Interesting workļ¼The good news is that our GCC(If difficulty in Chinese, you can download here) supports both RVP and RVV. However, only (version 0.9.3) RVP and (0.7.1 and 1.0) RVV are supported. If we use the old version of RVP and then switch to the latest version, the workload is acceptable. At the same time, if we want to use GCC, we may need to generate intrinsics instead of llvm IR. llvm also supports intrinsic compilation, which brings convenience to our future work.

Hi @PhilippvK,

Thanks for your initiative here and thanks for referencing my previous forum posts.

I think gathering efforts here makes a lot of sense, however, I should mention that now that things are clearer for me I donā€™t think the RISC-V aspect of those previous posts is that important in my work. We do have a RISC-V core in our own SoC (the one from pulpissimo), that acts as a host for driving the accelerators. However, the tiny core that comes with pulpissimo was never optimized/intended for any heavy lifting with regards to computation (we try to offload as many operations as possible to the accelerators). And for my current plan I donā€™t think we will ever optimize the calculations that are done on this tiny core itself. GAP8 might be interesting in this respect since it uses 8 RISC-V cores with specialized ISA-extensions for NN workloads in addition to the tiny core. However, in the end Iā€™ve never deployed anything on GAP8 and Iā€™m not planning to do so.

I think my biggest issue was getting something to come out of TVM in C-code that was standalone and could be compiled for microcontrollers (since we use the GCC compiler that ships with pulpissimo) but most of those problems have been resolved by the work of the uTVM folks in the past year, and our current work takes a lot of inspiration from the BYOC Ethos-U/CMSIS-NN work of the people from ARM and Linaro.

I donā€™t see us upstreaming any of our code anytime soon (youā€™re better of looking into the ARM stuff) but If anyone needs pointers or wants to hear about our experiences Iā€™d be glad to reply!

Best regards!

That sounds like a good approach to get started. I wouldā€™ve some technical questions regarding your T-Head-Semi GCC toolchain. Would it possible to contact you i.e. via mail to adress those?

hi @PhilippvK, you can contact us with email xp56@linux.alibaba.com. any questions about the toolchain are welcome.

If you guys prefer a higher bandwidth forum for this, we could add to a TVM Community Meeting agenda. We meet just about every week and the main requirement to add things to the agenda is that there is a thread where notes can be taken. This thread is certainly enough.

Definitely not, as long as you build spike from within the Dockefile.ci_riscv. We already build stuff from source when we build docker containers, and we just use those pre-built containers in the CI, so you wouldnā€™t see this in pre-merge CI runtime. How long do you think this would take?

Glad to hear this, please tag me and Iā€™ll review the PR as I have time (I am a bit behind right now :confused: ).

@alter-xp: what if we moved all the RISC-V related toolchains to a separate CI container, ci_riscv?

@areusch If ci_qemuā€™s work is very heavy, it is a good idea for us to separate ci_riscv. with the increasing content of rsicv, I didnā€™t think of any reason not to do so. and if the resources are enough, we can execute ci_riscv in parallel.

@PhilippvK Greate work! I looked into this approach last year, but GCC and LLVM have not officially supported the RISC-V V extension (likely?). I postponed the project since then due to my limited resources and ability. I would definitely love to follow and contribute to this project. I think the toolchain provided by T-head is a good starting point mentioned by @alter-xp.

1 Like

@alter-xp @areusch

Then I will wait with the integration of the Spike Simulator target until the Dockerfile for RISC-V is merged (https://github.com/apache/tvm/pull/12230).


As long we do not have a complete GCC in the CI it should be fine. The compilation time of Spike should be only a few minutes on a quad core machine.


What we still have to agree on is which types of RISC-V processors we would like to consider for now. In my opinion it would make sense to support at least one 32bit MCU-like device (RV32GC) and a 64bit device capable of running Linux (RV64GC). This decision is relevant as we have to build the proxy-kernel (which does the semihosting) for each of those separately.

Great, Iā€™ll make an effort to sort out the CI rebuild mess this week. That should let this work proceed.

Regarding the proxy-kernelā€“this looks a bit similar to the microTVM RPC server. While Iā€™m happy to admit that the RPC server is a bit heavyweight and could be better tested, the advantage of the RPC server is that it implements a protocol TVM knows how to control. When creating a Project API server implementation, the ultimate goal is to provide a connection to a TVM RPC server running on a foreign target. Placing this server on-target allows us to fully test the user experience in CI, rather than just compiled kernelsā€“for example, we can test tvmc run. We were able to get this running with reasonable speed on the ARM Corstone-300 FVP, so it seems possible that we might be able to test this via SPIKE too (it could be too slow or communicating with the firmware could also be slow, in which case we can stick with qemu here too. Just a suggestion to consider hereā€“it would be nice to minimize device runtime complexity!

@areusch Sorry, but I did not completely get your idea.

My current approach is using the MicroTVM Project API similar to tvm/src/runtime/crt/host at main Ā· apache/tvm Ā· GitHub to compile and run MicroTVM models using the stdin/stdout for ā€œemulatingā€ the serial communication. This works pretty well but limits the usage of the Spike Simulator to MicroTVM workloads (which involves compiling a complete target software binary for every run etc.)

Does your proposed solution go towards the same direction or do you want to accomplished something more powerful, letā€™s say by running an OS inside the Spike Simulator to host an real C++ TVM RPC server (not the MicroTVM bare metal one) which can then directly be used as an RPC target device without the need to have the ProjectAPI in between?

Could you please give me a hint on where to find this implemented for the Corstone300 FVP/QEMU?

@PhilippvK No thatā€™s about rightā€“just since you mentioned the proxy-kernel, not sure if that means youā€™d use that proxy-kernel to launch workloads on the sim? The microTVM RPC server could theoretically serve the same purpose (e.g. be a bit of firmware running on the sim which TVM can talk to directly).

Ah yeah it could be cast as running a real OS in SPIKE, although microTVM RPC server should run on bare metal. Iā€™m just not sure if the proxy-kernel is an additional layer here or not (really, I think it would be easier to take a look at your solution and see).

Apologiesā€“itā€™s not landed just yet but itā€™s here: https://github.com/apache/tvm/pull/12125

My implementation can be found here: tvm/apps/microtvm/spike at feature_microtvm_spike Ā· PhilippvK/tvm Ā· GitHub

Itā€™s pretty much a copy of the MicroTVM Host CRT Template.

As I have a deadline at the end of the week, I was not yet able to open up a pull-request for it.

I also sill have to look into the integration of Spike in the RISC-V Docker images.

1 Like

@PhilippvK No worries! I was able to land the RISC-V CI pipeline upstream now. I believe the Dockerfile still needs to get SPIKE added, but after that you should be able to merge your implementation!

Also cc @alter-xp the RISC-V docker image does include the CSI-NN install script, so feel free to continue now leveraging that image.

thanks @areusch, I will continue to integrate CSI-NN2.

here is my followup on the riscv Docker image:

1 Like