Vectorization for split factors non-divisible by the axis length

moderato · March 10, 2020, 9:03pm

Hello! I found that TVM doesn’t vectorize when the split factor is not divisible by the axis length. I discussed it in the following post with @FrozenGene.

This problem sometimes greatly affects the runtime performance and leaves the users very few choices of, for example, the smallest block size in GEMM, because only a limited number of such choices can result in vectorization in code generation.

Here’s an article discussing GEMM optimization on an AVX2 machine: https://gist.github.com/nadavrot/5b35d44e8ba3dd718e595e40184d03f0, mentioning sometimes the best smallest block size in GEMM implemented with instruction sets like AVX2 is somewhat “uncommon”, e.g. 2x5, 3x4, etc.

I wonder if TVM has any plan of making a new feature to avoid this situation? Any advice is appreciated!

@tqchen

FrozenGene · March 11, 2020, 3:16am

cc @Hzfengsy @merrymercy

tqchen · March 11, 2020, 11:03pm

This is something that we eventually want to handle, by splitting the loop along a certain direction, so most of the main body can be vectorized

moderato · March 12, 2020, 6:11am

Thank you for your reply! Is this gonna be a fix that involves a lot of changes? In the above post, @FrozenGene suggested a solution in Halide with a similar idea:

github.com

halide/Halide/blob/master/src/VectorizeLoops.cpp#L753


    } else {
        int lanes = std::max(predicate.type().lanes(), std::max(value.type().lanes(), index.type().lanes()));
        return Store::make(op->name, widen(value, lanes), widen(index, lanes),
                           op->param, widen(predicate, lanes), op->alignment);
    }
}


Stmt visit(const AssertStmt *op) override {
    return (op->condition.type().lanes() > 1) ? scalarize(op) : op;
}


Stmt visit(const IfThenElse *op) override {
    Expr cond = mutate(op->condition);
    int lanes = cond.type().lanes();
    debug(3) << "Vectorizing over " << var << "\n"
             << "Old: " << op->condition << "\n"
             << "New: " << cond << "\n";


    Stmt then_case = mutate(op->then_case);
    Stmt else_case = mutate(op->else_case);

Do you think it fits in what TVM currently has?