Permit few mentions, strictly w.r.t the RVV 0.7.1 LLVM unsupportedness issue.
@tqchen @cbalint13 Based on discussions from the community and your suggestions, I plan to handle the RISC-V vector/matrix extensions as follows:
As intro, I may repeat the issues of RVV 0.7.1 (all major ASIC HW out there are 0.7.1):
-
Currently T-Head & Sophon ASIC expose older RVV 0.7.1 specs.
-
LLVM does not support RVV 0.7.1, but only the 1.0.0 spec.
-
See the RVV version support of LLVM (implicit exposure via clang):
$ rpm -q clang clang-18.1.0~rc4-2.fc41.x86_64 $ clang --target=riscv64-unknown-elf -print-supported-extensions | grep "'V'" clang version 18.1.0 (Fedora 18.1.0~rc4-2.fc41) v 1.0 'V' (Vector Extension for Application Processors)
-
Another issue of T-Head ASIC implementations (e.g TH1520) is the expensiveness of
vsetvli
.
- For the vector extension, we will still perform scheduling for fixed-length vectors and use
tensorize
for general vector processing. To support the variable-length vector registers and operations specific to RISC-V vector, we will convert vector expression intoload + op + store
in the vectorizeloop pass. Theload/store
operations will use a variable-length style, with the specific length passed throughvl
, that is, tir.Var. Finally, based on the existing LLVM codegen, we will implement an LLVM codegen for RISC-V to handle special cases (codegen_riscv.cc
).
We clearly will be not able to invoke vl.xxx
LLVM-IR for RVV 0.7.1 spec.
To aleviate it we can still emmit RVV 0.7.1 LLVM-IR using ideas form this hardcoding llvm-ir generator.
Now you mention that special cases (like RVV 0.7.1) to be handled in codegen_riscv.cc
, but it also can be handled at code emmision time from TOPI’s tensorize _impl()
, and here the context of init/load/store can be even better be catched.
A sketch on the advantage to add it to the TOPI tensorizer _impl()
part:
- For
init
part in case of 0.7.1 you can use only once thesetvli
(for max performance): INIT: rvv-kernels/dot_int8_kernel.c at main · cbalint13/rvv-kernels · GitHub - For
load
part in case of 0.7.1 you can re-use already setsetvli
context: LOAD: rvv-kernels/dot_int8_kernel.c at main · cbalint13/rvv-kernels · GitHub - For final
store
(flush to main memory) in case of 0.7.1 you again invoke once: STORE: rvv-kernels/dot_int8_kernel.c at main · cbalint13/rvv-kernels · GitHub
I am not sure if we can capture the distinctions of these three steps (requiring expensive vsetvli contex switches) elegantly at the codegen_riscv.cc
time versuos from the TOPI tensorizer.
@zhupijuan_lkl Q: How you see this alternative instead of your codegen_riscv.cc
proposal ?
- For the matrix extension, considering that LLVM’s support for matrix is still not complete, I plan to adopt the following approach:
- For algorithm scheduling, since the matrix extension mainly accelerates conv/gemm operations, tensor layout transformations and alignment are typically performed during the scheduling of these cases. Therefore, during layout transformation, we will perform padding to ensure that the tensor shapes meet the requirements for subsequent tiling, thereby addressing the issue of tail blocks.
- For instruction generation, we will still use
tensorize
to perform computations on tiled blocks, but thetensorize
intrinsics will be inserted directly as LLVM IR. Specifically, we will wrap the matrix extension intrinsics in a C implementation of a micro-kernel, then use Clang to compile it into LLVM IR, and finally insert this LLVM IR into the tensorization code.
The initiative for the matrix extension is very nice, just as-is, I see it as a let’s move forward with it.
- LLVM also have special upstream support for T-Head many kind of extensions .
- Thus, we could also look at LLVM’s possible calls from LLVM-IR directly:
$ clang --target=riscv64-unknown-elf -print-supported-extensions | grep xtheadvdot
xtheadvdot 1.0 'xtheadvdot' (T-Head Vector Extensions for Dot)
Looking forward to your more suggestions, Thanks!
If this is a draft in need to be promoted I put my +1 vote to go forward with your proposal as-it-is now, and will try help your efforts within the PR reviewing times on this topic.
Thanks again @zhupijuan_lkl for your efforts here !