[DISCUSS]Introduce RISC-V Vector/Matrix extension

@tqchen Thanks for your response. Yes, tensorize is our preferred implementation path. But we are currently facing two main problems:

  1. For both vectorize and tensorize, when the loop cannot be completed split, there will be a tail loop problem, which will interrupt vectorize/tensorize. If the pad_einsum schedule is applied, it will increase redundant memory copying. If the loop partition schedule is adopted, the tail part of the loop cannot use the vector/matrix instruction for acceleration;
  2. Due to the current lack of expressions similar to vl in the RISC-V vector extension in TVM, especially when calling intrinsics, vl will be passed as a parameter to the instruction execution, so it is not possible to fully realize the advantages of vector/matrix optimization. We want to try adding related semantics, but we are not sure if the community is willing to accept it, as it may not be that general for other platforms…