What is vectorization and is it a just hint?

It’s not a hint, but the codegen may fail to vectorize if the required conditions doesn’t meet. When failure, you can see two outcomes: It is either no effect and still process in sequential, or throw an error as you reported.

In terms of CUDA, AFAIK, vectorization is only effective when dtype is float16. In this case, CUDA codegen will put two half values together and make use of half2 to better utilize the bandwidth. As a result, the valid vectorize size has to be dividable. You can refer to the discussion here: [CUDA] Enable half2 in CUDA injective schedule - #4 by vinx13.

2 Likes