Integration of muRISCV-NN kernel library in TVM

Our team at the TU Munich (@r.stahl @fabian) has recently open-sourced our work on porting the ARM CMSIS-NN library to RISC-V targets:

Let me shortly summarize the features:

  • Support for 3 Modes: Default (Portable C-Code), Packed (P-Extension, sub-word SIMD, v0.9.2), Vector (V-Extension, super-word SIMD, v1.0)
  • CMSIS-NN Compatibility layer: You can simply use libmuriscv-nn.a instead of lib libcmsis-nn.a and you should be good to go without changing any code.
  • This allows using the library not only using the TFLite micro framework but also in TVM using the CMSIS-NN BYOC integration.

The main reason for this post is discussing if this library would be a good contribution to TVM bringing the RISC-V support in (Micro)TVM one step further. If there is interest I would be happy to formulate an RFC for this.

However these are my current concerns:

  • The library is already usable using the existing CMSISNN BYOC implementation. Thus it wouldn’t make sense to copy and paste all the available code for adding muRiscvNN support as well.
  • The only thing I would like to get rid of if the fake-mapping of the mcpu PassConfig used by the BYOC code to decide which extensions should be enabled. Currently for enabling the P/V-Extension we use --target-cmsis-nn-mcpu=cortex-m33/55 which is quite unintuitive.

I am looking forward to any feedback.


Here are some Metrics generated for the MLPerf Tiny Benchmark. The Instruction Counts are obtained using the Spike Simulator/ISS and therefore not cycle-accurate as RVV1.0 compatible chips are not yet available.

@PhilippvK this looks like great work! is there a demo script that shows how to integrate muRISCV-NN with TVM? perhaps we could look at the mcpu issue in that context.

@PhilippvK this looks like great work! is there a demo script that shows how to integrate muRISCV-NN with TVM? perhaps we could look at the mcpu issue in that context.

I recently added some integration tests which can be found here:

If only considering the V-Extension the complete flow can be broken down to:

# clone muriscvnn
git clone
cd muriscv-nn
git checkout integration-tests

# download toolchain
cd Toolchain && ./ && cd -
export TOOLCHAIN_DIR=$(pwd)/Toolchain/rv32gcv

# install tvm
virtualenv -p python3.8 .venv # optional
source .venv/bin/activate # optional
pip install "tlcpack-nightly" -f
pip install tflite

# install muriscvnn
cmake --build ./build

# download model
wget -q

# generate mlf
tvmc compile resnet.tflite --runtime crt --executor aot --pass-config "tir.disable_vectorize=1" --pass-config "tir.usmp.enable=1" --pass-config "tir.usmp.algorithm=hill_climb" --opt-level 3 -f mlf --runtime-crt-system-lib 0 --target-c-constants-byte-alignment 4 --target-c-workspace-byte-alignment 4 --target-c-executor aot --target-c-unpacked-api 1 --target-c-interface-api c --output mlf.tar --target cmsis-nn,c --target-cmsis-nn-mcpu=cortex-m55
mkdir -p mlf
tar xf mlf.tar -C mlf/

# build runtime
export PREFIX=$TOOLCHAIN_DIR/bin/riscv32-unknown-elf
cd mlf/runtime
cp template/crt_config-template.h crt_config.h
make common -j`nproc` \
        CRT_CONFIG=crt_config.h \
        CC=$PREFIX-gcc \
        CXX=$PREFIX-g++ \
        RANLIB=$PREFIX-ranlib \
cd -

# build target sw
KERNEL_SRCS=($(find ./mlf/codegen/host/src -name "*.c"))
# The following should be transformed into a makefile
$PREFIX-gcc -o main.elf Integration/TVM/sw/main.c ${KERNEL_SRCS} \
       -I./mlf/runtime/include/ \
       -I./mlf/codegen/host/include \
       -I./Include/CMSIS/NN/Include/ \
       -I./Include/ \
       -L./build/Source/ \
       -L./mlf/runtime/build/ \
       -lmuriscv_nn \
       -lcommon \
       -Wno-implicit-function-declaration \
       -Wno-incompatible-pointer-types \

# install spike
cd ./Sim/Spike/bin && ./ && cd -

# run simulation
./Sim/Spike/bin/spike --isa=rv32gcv --varch=vlen:1024,elen:32 ./Sim/Spike/bin/pk main.elf

Ideally we whole flow would be based on MicroTVM, which is currently lacking support for Spike Simulation.

For the main discussion of the mcpu mapping only the --target-cmsis-nn-mcpu=cortex-m55 part is relevant. For riscv targets the existing extensions can be obtained either from the -march=rv32gcv string (GCC) or from the -mattr=+v features (LLVM). I think hardcoding mcpu -> MVEI/DSP mappings for RISC-V targets in the cmsis-nn BYOC backend does not sound like a good idea to me. In addition if muRISCV-NN would be a good addition to the TVM ecosystem, we should also consider adding the following:

  • Add documentation/tutorials
  • Add Tests
  • CI Integration (RISC-V Toolchain (GCC/LLVM), Spike Simulator)
  • MircoTVM Template capable of running Spike