Crash triggered by 8 lines of TIR code

UniverseFly · August 16, 2021, 11:17am

Continuing the discussion from Crash triggered by 5 lines of TIR code: I encountered another crash related to Load node. Code is shown below:

from tvm import tir
import tvm

if __name__ == '__main__':
    zero = tir.const(0)
    nop = tir.Evaluate(zero)

    v = tir.Var('i1', 'int32')
    for_stmt = tir.For(v, zero, zero, tir.ForKind.SERIAL, nop)
    load = tir.Evaluate(tir.Load('int32', v, zero))
    seq = tir.SeqStmt([for_stmt, for_stmt, load])
    
    func = tir.PrimFunc([], seq)
    tvm.build(func)

which produces a segmentation fault:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffdd10d7fb in tvm::tir::IRConvertSSA::VisitExpr_(tvm::tir::LoadNode const*) ()

I know I may misuse some APIs but I’m just curious why it leads to a crash of the whole program. I’d appreciate any help .

junrushao · August 16, 2021, 4:47pm

Thanks for bringing these bugs up

Hmm we are not using low-level TIR as a frontend though, so if the IR is not in good form that comes down from TE / TVM script / etc, it is possible that the code path is not tested.

Is there any specific reason we need to build these IRs by hand? Thanks a lot!

UniverseFly · August 17, 2021, 7:44am

Thanks @junrushao for your reply!

To be frank, for now, for a high-level use of TVM I agree that it’s not necessary to build these IRs by hand. I did this because I found you were promoting TensorIR which makes TIR self-contained. I am interested in this and I think in the future TIR itself would be used more frequently, so I started playing with its APIs (and hybrid scripts) to understand it better .

junrushao · August 17, 2021, 7:58am

thanks for your interest in TensorIR work!

To clarify, the entry point we are thinking of is the scheduleable TensorIR written with blocks with TVM script (not TE hybrid script), not the lower-level un-scheduleable IR you are constructing. Happy to chat more

syang · September 5, 2021, 12:19pm

Hi all! I think I found the root cause of this segfault when I read related C++ backend implementation.

We can found that, function tvm::tir::IRConvertSSA::VisitExpr_(tvm::tir::LoadNode const*) implemented in tir/transforms/ir_utils.cc(Line 113) forgets to check the size of scope_[op->buffer_var.get()]:

PrimExpr VisitExpr_(const LoadNode* op) final {
  PrimExpr expr = StmtExprMutator::VisitExpr_(op);
  op = expr.as<LoadNode>();
  if (scope_.count(op->buffer_var.get())) {
    return Load(op->dtype, scope_[op->buffer_var.get()].back(), op->index, op->predicate);
  } else {
    return expr;
  }
}

As document said, calling std::vector::back on an empty container causes undefined behavior. So if scope_[op->buffer_var.get()] is actually empty, we will get undefined behavior, like this segfault.

To avoid undefined behavior, it’s necessary to check the size of vector before calling back(). And we can find another VisitExpr_ about AttrStmtNode actually uses scope_.count(v) and scope_[v].size() != 0 to check vector before calling scope_[v].back():

Stmt VisitStmt_(const AttrStmtNode* op) final {
  if (const VarNode* v = op->node.as<VarNode>()) {
    Stmt stmt = StmtExprMutator::VisitStmt_(op);
    op = stmt.as<AttrStmtNode>();
    if (scope_.count(v) && scope_[v].size() != 0) {
      return AttrStmt(scope_[v].back(), op->attr_key, op->value, op->body);
    } else {
      return stmt;
    }
  } else {
    return StmtExprMutator::VisitStmt_(op);
  }
}

syang · September 5, 2021, 1:03pm

And to fix the problem, I have made a PR to check the vector size before calling back(): https://github.com/apache/tvm/pull/8930