Right now, TVM stack traces can be pretty confusing. With debug info on, the most information each stack trace contains in the per frame shared library name, offset into the library, program counter and maybe function name. Ideally this information would also include the file the function is in and its line number.
There are a couple libraries I’ve found to give us more detailed stack traces, but the best and most cross platform appears to be libbacktrace (https://github.com/ianlancetaylor/libbacktrace). It was used for a while by the rust compiler until they switched to one written entirely in rust. libbacktrace also has the benefit of working on linux, macOS, and windows.
I’ve developed a branch using libbacktrace and generated some comparisons between what we currently have and libbacktrace:
current, debug symbols
Stack trace:
[bt] (0) /home/tristan/octoml/tvm/build/libtvm.so(+0x9c3aa6) [0x7f115796aaa6]
[bt] (1) /home/tristan/octoml/tvm/build/libtvm.so(+0x9d6d15) [0x7f115797dd15]
[bt] (2) /home/tristan/octoml/tvm/build/libtvm.so(TVMFuncCall+0x63) [0x7f115817c7e3]
[bt] (3) /lib/x86_64-linux-gnu/libffi.so.8(+0x7249) [0x7f11ac56d249]
[bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x6629) [0x7f11ac56c629]
[bt] (5) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x5a1) [0x7f11ab27d4d1]
[bt] (6) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x139e0) [0x7f11ab27c9e0]
[bt] (7) python3(_PyObject_MakeTpCall+0x353) [0x514313]
[bt] (8) python3(_PyEval_EvalFrameDefault+0x5a9c) [0x50e85c]
libbacktrace, debug symbols
Stack trace:
[bt] (0) ../3rdparty/dmlc-core/include/dmlc/logging.h:514 dmlc::LogMessageFatal::~LogMessageFatal()
[bt] (1) ../include/tvm/runtime/packed_func.h:1251 unpack_call<tvm::tir::LT, 3, tvm::tir::<lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)> >
[bt] (2) ../include/tvm/runtime/packed_func.h:1307 operator()
[bt] (3) /usr/include/c++/10/bits/invoke.h:60 __invoke_impl<void, tvm::runtime::TypedPackedFunc<R(Args ...)>::AssignTypedLambda<tvm::tir::<lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)> >::<lambda(const tvm::runtime::TVMArgs&, tvm::runtime::TVMRetValue*)>&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*>
[bt] (4) /usr/include/c++/10/bits/invoke.h:153 __invoke_r<void, tvm::runtime::TypedPackedFunc<R(Args ...)>::AssignTypedLambda<tvm::tir::<lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)> >::<lambda(const tvm::runtime::TVMArgs&, tvm::runtime::TVMRetValue*)>&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*>
[bt] (5) /usr/include/c++/10/bits/std_function.h:291 _M_invoke
[bt] (6) /usr/include/c++/10/bits/std_function.h:622 std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
[bt] (7) ../include/tvm/runtime/packed_func.h:990 tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
[bt] (8) ../src/runtime/c_runtime_api.cc:428 TVMFuncCall
[bt] (9) 0x00007f11ac56d248
current, no debug symbols
Stack trace:
[bt] (0) /home/tristan/octoml/tvm/build/libtvm.so(+0xa5b9c8) [0x7fd96c8f79c8]
[bt] (1) /home/tristan/octoml/tvm/build/libtvm.so(+0xa6a66f) [0x7fd96c90666f]
[bt] (2) /home/tristan/octoml/tvm/build/libtvm.so(TVMFuncCall+0x6c) [0x7fd96d214dbc]
[bt] (3) /lib/x86_64-linux-gnu/libffi.so.8(+0x7249) [0x7fd9c161f249]
[bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x6629) [0x7fd9c161e629]
[bt] (5) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x5a1) [0x7fd9c032f4d1]
[bt] (6) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x139e0) [0x7fd9c032e9e0]
[bt] (7) python3(_PyObject_MakeTpCall+0x353) [0x514313]
[bt] (8) python3(_PyEval_EvalFrameDefault+0x5a9c) [0x50e85c]
libbacktrace, no debug symbols
Stack trace:
[bt] (0) dmlc::LogMessageFatal::~LogMessageFatal() [clone .constprop.0]
[bt] (1) tvm::Span tvm::runtime::TVMPODValue_::AsObjectRef<tvm::Span>() const
[bt] (2) tvm::runtime::TVMMovableArgValue_::operator tvm::Span<tvm::Span, void>() const
[bt] (3) std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::tir::LT (tvm::PrimExpr, tvm::PrimExpr, tvm::Span)>::AssignTypedLambda<tvm::tir::__mk_TVM64::{lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)#1}>(tvm::tir::__mk_TVM64::{lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)#1})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
[bt] (4) TVMFuncCall
[bt] (5) 0x00007fc6e5397248
[bt] (6) 0x00007fc6e5396628
[bt] (7) _ctypes_callproc
[bt] (8) 0x00007fc6e40a69df
[bt] (9) _PyObject_MakeTpCall
Pros of using libbacktrace
- Stack traces are significantly better with debug symbols
- Stack traces are better in some cases without debug symbols
Cons
- libbacktrace is another dependency we need to add to our codebase (it can be statically linked)
- We would either have to add libbacktrace to dmlc-core or tvm. If we add it to TVM, we need to write our own logging/CHECK infrastructure.
Questions
A. Is it worth adding libbacktrace to the codebase?
B. Should libbacktrace be an optional dependency?
C Should we add libbacktrace to dmlc-core or TVM?
I look forward to hearing your feedback.