[RFC] Line numbers in backtraces

Right now, TVM stack traces can be pretty confusing. With debug info on, the most information each stack trace contains in the per frame shared library name, offset into the library, program counter and maybe function name. Ideally this information would also include the file the function is in and its line number.

There are a couple libraries I’ve found to give us more detailed stack traces, but the best and most cross platform appears to be libbacktrace (https://github.com/ianlancetaylor/libbacktrace). It was used for a while by the rust compiler until they switched to one written entirely in rust. libbacktrace also has the benefit of working on linux, macOS, and windows.

I’ve developed a branch using libbacktrace and generated some comparisons between what we currently have and libbacktrace:

current, debug symbols

Stack trace:
  [bt] (0) /home/tristan/octoml/tvm/build/libtvm.so(+0x9c3aa6) [0x7f115796aaa6]
  [bt] (1) /home/tristan/octoml/tvm/build/libtvm.so(+0x9d6d15) [0x7f115797dd15]
  [bt] (2) /home/tristan/octoml/tvm/build/libtvm.so(TVMFuncCall+0x63) [0x7f115817c7e3]
  [bt] (3) /lib/x86_64-linux-gnu/libffi.so.8(+0x7249) [0x7f11ac56d249]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x6629) [0x7f11ac56c629]
  [bt] (5) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x5a1) [0x7f11ab27d4d1]
  [bt] (6) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x139e0) [0x7f11ab27c9e0]
  [bt] (7) python3(_PyObject_MakeTpCall+0x353) [0x514313]
  [bt] (8) python3(_PyEval_EvalFrameDefault+0x5a9c) [0x50e85c]

libbacktrace, debug symbols

Stack trace:
  [bt] (0) ../3rdparty/dmlc-core/include/dmlc/logging.h:514 dmlc::LogMessageFatal::~LogMessageFatal()
  [bt] (1) ../include/tvm/runtime/packed_func.h:1251 unpack_call<tvm::tir::LT, 3, tvm::tir::<lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)> >
  [bt] (2) ../include/tvm/runtime/packed_func.h:1307 operator()
  [bt] (3) /usr/include/c++/10/bits/invoke.h:60 __invoke_impl<void, tvm::runtime::TypedPackedFunc<R(Args ...)>::AssignTypedLambda<tvm::tir::<lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)> >::<lambda(const tvm::runtime::TVMArgs&, tvm::runtime::TVMRetValue*)>&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*>
  [bt] (4) /usr/include/c++/10/bits/invoke.h:153 __invoke_r<void, tvm::runtime::TypedPackedFunc<R(Args ...)>::AssignTypedLambda<tvm::tir::<lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)> >::<lambda(const tvm::runtime::TVMArgs&, tvm::runtime::TVMRetValue*)>&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*>
  [bt] (5) /usr/include/c++/10/bits/std_function.h:291 _M_invoke
  [bt] (6) /usr/include/c++/10/bits/std_function.h:622 std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
  [bt] (7) ../include/tvm/runtime/packed_func.h:990 tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
  [bt] (8) ../src/runtime/c_runtime_api.cc:428 TVMFuncCall
  [bt] (9) 0x00007f11ac56d248

current, no debug symbols

Stack trace:
  [bt] (0) /home/tristan/octoml/tvm/build/libtvm.so(+0xa5b9c8) [0x7fd96c8f79c8]
  [bt] (1) /home/tristan/octoml/tvm/build/libtvm.so(+0xa6a66f) [0x7fd96c90666f]
  [bt] (2) /home/tristan/octoml/tvm/build/libtvm.so(TVMFuncCall+0x6c) [0x7fd96d214dbc]
  [bt] (3) /lib/x86_64-linux-gnu/libffi.so.8(+0x7249) [0x7fd9c161f249]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x6629) [0x7fd9c161e629]
  [bt] (5) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(_ctypes_callproc+0x5a1) [0x7fd9c032f4d1]
  [bt] (6) /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x139e0) [0x7fd9c032e9e0]
  [bt] (7) python3(_PyObject_MakeTpCall+0x353) [0x514313]
  [bt] (8) python3(_PyEval_EvalFrameDefault+0x5a9c) [0x50e85c]

libbacktrace, no debug symbols

Stack trace:
  [bt] (0) dmlc::LogMessageFatal::~LogMessageFatal() [clone .constprop.0]
  [bt] (1) tvm::Span tvm::runtime::TVMPODValue_::AsObjectRef<tvm::Span>() const
  [bt] (2) tvm::runtime::TVMMovableArgValue_::operator tvm::Span<tvm::Span, void>() const
  [bt] (3) std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::tir::LT (tvm::PrimExpr, tvm::PrimExpr, tvm::Span)>::AssignTypedLambda<tvm::tir::__mk_TVM64::{lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)#1}>(tvm::tir::__mk_TVM64::{lambda(tvm::PrimExpr, tvm::PrimExpr, tvm::Span)#1})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  [bt] (4) TVMFuncCall
  [bt] (5) 0x00007fc6e5397248
  [bt] (6) 0x00007fc6e5396628
  [bt] (7) _ctypes_callproc
  [bt] (8) 0x00007fc6e40a69df
  [bt] (9) _PyObject_MakeTpCall

Pros of using libbacktrace

  • Stack traces are significantly better with debug symbols
  • Stack traces are better in some cases without debug symbols

Cons

  • libbacktrace is another dependency we need to add to our codebase (it can be statically linked)
  • We would either have to add libbacktrace to dmlc-core or tvm. If we add it to TVM, we need to write our own logging/CHECK infrastructure.

Questions

A. Is it worth adding libbacktrace to the codebase?
B. Should libbacktrace be an optional dependency?
C Should we add libbacktrace to dmlc-core or TVM?

I look forward to hearing your feedback.

3 Likes

this is cool, would support libbacktrace in all connected codebases (though I don’t know what is at stake in dmlc-core and others). it seems like it could be optional. one question: wasn’t there a problem with this on dynamically loaded os x libraries?

I have a PR with libbacktrace fixing it.

I like libbacktrace in general so a +1 for the idea , saves one trip to the debugger just to get a backtrace. We shouldn’t be adding this to the code base - I would suggest it as an external dependency though someone should check the licenses are compatible and usable and according to ASF policy.

Ramana

I like the idea of having line numbers in the stacktrace, which makes it look clearer. So +1 for the idea!

Just curious on the implementation: what compilation flags are required? Does it add “-rdynamic”? What happens if we set HIDE_PRIVATE_SYMBOLS to ON in the cmake config?

BTW shall we do just some formatting tweaks, making it more look clearer? The rust style looks like (i copied from the internet):

$ RUST_BACKTRACE=1 cargo run
thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 99', /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/slice/mod.rs:2806:10
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:84
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:61
   4: core::fmt::ArgumentV1::show_usize
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1426
   6: std::sys_common::backtrace::_print

Well, right now HIDE_PRIVATE_SYMBOLS doesn’t do anything (there’s a bug in the cmakelists.txt). If I fix it, it has no effect on the backtrace (on linux, haven’t tested macOS yet). “-rdynamic” is not needed either.

The rust backtraces do look nice, so we could match them. Or we could try and matching the calling language (python).

1 Like

Yeah I think we can stick to rust-style stack trace (which looks good enough) and do not over-engineer it

@ramana-arm Here is the test of the license:

# Copyright (C) 2012-2016 Free Software Foundation, Inc.

# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:

#     (1) Redistributions of source code must retain the above copyright
#     notice, this list of conditions and the following disclaimer. 

#     (2) Redistributions in binary form must reproduce the above copyright
#     notice, this list of conditions and the following disclaimer in
#     the documentation and/or other materials provided with the
#     distribution.  
    
#     (3) The name of the author may not be used to
#     endorse or promote products derived from this software without
#     specific prior written permission.

# THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
# INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
# STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
# IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

It seems that we can use it as long as we include the license somewhere in our documentation. I’m not clear on the apache rules, so maybe someone else can chime in.

I’ve posted a PR here: https://github.com/apache/tvm/pull/7153.

Note that when adding libbacktrace to TVM, there are a significant numbers of changes I had to make: 1. I had implement our own versions of CHECK and LOG macros. 2. I had to add a new exception type that keeps more structured information. 3. I had to hide dmlc’s CHECK and LOG macros.

In the PR I also renamed our CHECK and LOG to TVM_CHECK and TVM_LOG respectively. This is a big change, but I think it is necessary. Right now any library that uses glog cannot be used with ours because the LOG macros conflict. Furthermore, non-namespaced macros are forbidden in our current coding guidelines (google’s style guide, even though they disobey it). It may be possible to more this change to a separate PR, but it might conflict with dmlc’s macros.

1 Like

Hey Tristan, would you like to clarify

  1. The relationship between the newly proposed dmlc::Error, tvm::Error, tvm::CompileError, and when to use which
  2. Do we use dmlc/logging any more?
  3. Does this approach work on MSVC? Like using dsymutil and -ffile-prefix-map.

Thank you so much!

  1. dmlc:Error is no longer used, now we use tvm::Error instead. tvm::Error is the base class for all errors that tvm throws. tvm::CompileError is used in relay to throw user-facing errors.
  2. dmlc/logging is no longer used.
  3. Not sure about windows support. I need to test that. dsymutil is macOS only. -ffile-prefix-map is supported by both GCC and Clang, not sure about support on windows. -ffile-prefix-map isn’t necessary though, it just makes the filenames more readable by switch them from a relative path (prefixed with …/) to an absolute path.
1 Like

Thank you for the explanation!

Yeah I think the PR looks great. Yeah I think it is okay if it does not fully work on some platforms, as long as we make sure it can still build and produce the error messages we had before as a smooth fallback :slight_smile:

this is great! I sort of prefer to push this into dmlc if possible, since it would mean that we could generate backtraces in more places and would require less renaming, but overall I like the approach!

@areusch When I talked with @tqchen, he wanted to avoid putting stuff in dmlc. I agree with him. Putting stuff in dmlc is hard because we need to get a PR accepted there and then make changes in TVM. It’s also confusing because there are two namespaces and its unclear the relation between TVM and dmlc.

Are there specific places you want to generate a backtrace that this PR does not handle?

In this particular case I do not have strong preference as both options are listed in the RFC. using a tvm specific logging could help if we want to evolve the module further. Let us also hear what others think

We can stabilize it a bit before putting into dmlc. Also cc: @hcho3

After looking into the PR itself. It seems that adding another variant of glog style check(CHECK) would mean conflict with dmlc/logging.h itself (both defines the glog style check). This is something I overlooked. A dependency on header only component of dmlc-core is usually fine and we are using some of the common features shared with other projects, like xgboost.

Renaming all the CHECK would require more discussions and impact other on-going development and would need more deliberation. A separate RFC would be great to give sufficient notice ahead of time to the community.

So perhaps it is worth thinking about bringing this support optionally to dmlc to avoid the renaming when possible as suggested by @areusch.

As a middle ground, we could introduce an macro to dmlc-core, (e.g. like DMLC_USE_GLOG) which disables the definition of the

Unfortunately, libbacktrace is not header only, so we cannot keep dmlc as a header-only library if we want to use libbacktrace.

That is why we want it as an optional dependency(instead of enabling by default). Given there is also windows portability concern. Let us think a bit about possible technical solutions to get around it.