Why reorder and try_unroll_vec on Skylake: any details?

moderato · February 21, 2020, 7:38pm

OK, I see. I can think of an example case that unrolling might outperform vectorization when say, the innermost axis is not divisible to 8 FP32 on an AXV2 machine like mine. It’s said that mixing AVX and non-AVX instructions will be penalized. I suppose it’s close to what you’re saying about try_unroll_vec right?

Speaking of this, I do have another vectorization-related question posted recently. It’s about TVM doesn’t vectorize the innermost if the split factor of the second innermost is not divisible to the original axis length. Could you take a look as well?

Sorry for bugging you with so many questions. I do appreciate your time looking into them!