Thanks for @MichaelLee’s kind help, I understand this problem, and please allow me to summarize this problem:
If we set target
as a non-llvm cpu device (e.g. cuda), the PrimFunc during calling tvm.build
will be split, and the original PrimFunc will be split into:
- host side: calling the kernel
- device side: specific implementation of the kernel
And this split happens in pass SplitHostDevice
during calling _build_for_device(input_mod, tar, target_host)
(/python/tvm/driver/build_module.py#L357).
Stmt VisitStmt_(const AttrStmtNode* op) final {
if (op->attr_key == attr::thread_extent || op->attr_key == attr::pipeline_exec_scope ||
op->attr_key == attr::device_scope) {
return SplitDeviceFunc(GetRef<Stmt>(op));
}
return StmtMutator::VisitStmt_(op);
}
If the attribute satisfies one of the following three conditions, SplitDeviceFunc
will be invoked.
thread_extend
pipeline_exec_scope
device_scope
The most important thing is that, after the split, the attribute calling_conv
of the device side will be set to kDeviceKernelLaunch
. The attributecalling_conv
represents the calling convention between PrimFunc calling PrimFunc, which must be followed when different target codegens, otherwise there will be problems during calling.
In tvm, there are three kinds of calling conventions:
enum class CallingConv : int {
/*!
* \brief Default calling convetion.
*
* - Uses the native calling convention of the target.
* - Implementation: specified by the native target.
*/
kDefault = 0,
/*!
* \brief PackedFunc that exposes a CPackedFunc signature.
*
* - Calling by PackedFunc calling convention.
* - Implementation: Expose a function with the CPackedFunc signature.
*/
kCPackedFunc = 1,
/*!
* \brief Device kernel launch
*
* - Call by PackedFunc calling convention.
* - Implementation: defined by device runtime(e.g. runtime/cuda)
*/
kDeviceKernelLaunch = 2,
};
Therefore, this question is very clear, since I set target
as llvm, the excepted calling convention is actually kDefault
. While pass DecorateDeviceScope
will mark attribute device_scope
, which means the PrimFunc will be split, and the calling convention is set to kDeviceKernelLaunch
:
Stmt DecorateDeviceScope(Stmt&& stmt) {
Stmt body = AttrStmt(make_zero(DataType::Int(32)), tir::attr::device_scope, 0, stmt);
return body;
}
As we said before, this unexpected calling convention finally causes this program to crash. To prevent this unexpected behavior and fix this crash-causing bug, it is necessary to introduce checks here.