Program crashed after applying pass `DecorateDeviceScope`

I accidentally found that this program will crash after applying tir pass DecorateDeviceScope.

To reproduce

import tvm
import numpy as np
from tvm import tir, te

n = te.size_var("n")
m = te.size_var("m")
A = te.placeholder((n, n), name="A", dtype="int32")

T = te.compute((m, m), lambda i, j: A[i][j])
s = te.create_schedule(T.op)
ir_m = tvm.lower(s, [A, T])


inputs = [tvm.nd.array(np.random.uniform(0, 100, size=(32, 32)).astype("int32"))]
output = tvm.nd.empty((32, 32), "int32")

with tvm.transform.PassContext(opt_level=4):
    opt = tvm.transform.Sequential(
        [tir.transform.DecorateDeviceScope()]
    )
    mod = opt(ir_m)

    opt_execute = tvm.build(mod, [*inputs, output], tvm.target.Target("llvm"))
    opt_execute(*[inputs[0], output])

tvm.build will be executed successfully, while the program will throw a segfault during executing opt_execute(*[inputs[0], output])

Debugging: out-of-bounds read when crashed

I use gdb to find out why it crashes, and it seems that an Out-of-Bounds Read occurred during the execution of the built module. r12 + rdx*4 will get 0x7ffeadebf258, which is not a readable memory address.

I am curious about the reason what the pass does and why it will cause such an error?

Hi, @syang The issue is related to calling convention between host and device side. The root cause is hard to explain in short words, but you can contact me if you are interested in it. Thanks very much!

Hi, @MichaelLee Thank you for your reply! I’m very interested in the root cause and how can I contact you? Email or other convenient methods for you? My email is syang.sec@gmail.com. Thanks very much! :grinning_face_with_smiling_eyes:

I will send an email to you~, @syang.

Thanks for @MichaelLee’s kind help, I understand this problem, and please allow me to summarize this problem:

If we set target as a non-llvm cpu device (e.g. cuda), the PrimFunc during calling tvm.build will be split, and the original PrimFunc will be split into:

  • host side: calling the kernel
  • device side: specific implementation of the kernel

And this split happens in pass SplitHostDevice during calling _build_for_device(input_mod, tar, target_host)(/python/tvm/driver/build_module.py#L357).

Stmt VisitStmt_(const AttrStmtNode* op) final {
  if (op->attr_key == attr::thread_extent || op->attr_key == attr::pipeline_exec_scope ||
      op->attr_key == attr::device_scope) {
    return SplitDeviceFunc(GetRef<Stmt>(op));
  }
  return StmtMutator::VisitStmt_(op);
}

If the attribute satisfies one of the following three conditions, SplitDeviceFunc will be invoked.

  • thread_extend
  • pipeline_exec_scope
  • device_scope

The most important thing is that, after the split, the attribute calling_conv of the device side will be set to kDeviceKernelLaunch. The attributecalling_conv represents the calling convention between PrimFunc calling PrimFunc, which must be followed when different target codegens, otherwise there will be problems during calling.

In tvm, there are three kinds of calling conventions:

enum class CallingConv : int {
  /*!
   * \brief Default calling convetion.
   *
   * - Uses the native calling convention of the target.
   * - Implementation: specified by the native target.
   */
  kDefault = 0,
  /*!
   * \brief PackedFunc that exposes a CPackedFunc signature.
   *
   * - Calling by PackedFunc calling convention.
   * - Implementation: Expose a function with the CPackedFunc signature.
   */
  kCPackedFunc = 1,
  /*!
   * \brief Device kernel launch
   *
   * - Call by PackedFunc calling convention.
   * - Implementation: defined by device runtime(e.g. runtime/cuda)
   */
  kDeviceKernelLaunch = 2,
};

Therefore, this question is very clear, since I set target as llvm, the excepted calling convention is actually kDefault. While pass DecorateDeviceScope will mark attribute device_scope, which means the PrimFunc will be split, and the calling convention is set to kDeviceKernelLaunch:

Stmt DecorateDeviceScope(Stmt&& stmt) {
  Stmt body = AttrStmt(make_zero(DataType::Int(32)), tir::attr::device_scope, 0, stmt);
  return body;
}

As we said before, this unexpected calling convention finally causes this program to crash. To prevent this unexpected behavior and fix this crash-causing bug, it is necessary to introduce checks here.