LogFatal may skip some important errors/exceptions

ganler · June 18, 2021, 4:22pm

Description and bug reproduce

When trying to creating a Out-Of-Memory error in relay.vm, it seems TVM cannot locate the error correctly. i.e., it did not report the OOM error, but ignored it and continues to hit some upcoming errors.

To reproduce: Google Colab

This error happens in colab and my PC (CentOS, Clang13/gcc7/gcc9, etc.).

Troubleshooting

I did some investigation on this but failed to come up with a gentle solution.

Background

When an OOM occurs in CUDA_CALL(cudaMalloc), it is expected to throw InternalError (by GetEntry().Finalize() in LogFatal’s destructor), the python end should report some like: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory\nStack trace:...

GDB Debugging

Before catch (i.e., __cxa_begin_catch), there are 2 more exceptions got “ignored” (nobody catches them and std::terminate is not called).

Ideally I think it is expected to let the first exception (i.e., cuda OOM) result in std::terminate(). Since previous exceptions are “ignored”, the program later performs “undefined behaviors” which finally produce error message elsewhere.

...
(gdb) catch catch
Catchpoint 1 (catch)
(gdb) catch throw
Catchpoint 2 (throw)
(gdb) catch rethrow
Catchpoint 3 (rethrow)
(gdb) r
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
[New Thread 0x7f30b6c50700 (LWP 88390)]
[New Thread 0x7f30b644f700 (LWP 88391)]
[New Thread 0x7f30adc4e700 (LWP 88392)]
...
[New Thread 0x7f2fb7c32700 (LWP 88420)]
warning: File "/home/ljw/lib64/libstdc++.so.6.0.24-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /home/ljw/lib64/libstdc++.so.6.0.24-gdb.py
line to your configuration file "/home/ljw/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/ljw/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread 0x7f306a447700 (LWP 88399) exited]
[Thread 0x7f2fe2437700 (LWP 88415) exited]
[Thread 0x7f2feac38700 (LWP 88414) exited]
[Thread 0x7f2ff3439700 (LWP 88413) exited]
...
[Thread 0x7f30b6c50700 (LWP 88390) exited]
[Detaching after fork from child process 88438]
[New Thread 0x7f2fb7c32700 (LWP 88442)]
[New Thread 0x7f2fc8433700 (LWP 88443)]
...
[New Thread 0x7f2fa0474700 (LWP 88472)]
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32, workload=('conv2d_nchw.cuda', ('TENSOR', (any_dim, 3, 224, 224), 'float32'), ('TENSOR', (32, 3, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
[New Thread 0x7f2f91c73700 (LWP 88477)]
[Thread 0x7f2fe2437700 (LWP 88464) exited]
[Thread 0x7f2fa0474700 (LWP 88472) exited]
...
[Thread 0x7f2fb7c32700 (LWP 88442) exited]
[Detaching after fork from child process 88478]
Running batch size = 0
[New Thread 0x7f2fa0474700 (LWP 88484)]
[New Thread 0x7f2fa0c75700 (LWP 88485)]
Running batch size = 1
Running batch size = 2
...
Running batch size = 62
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "python" hit Catchpoint 2 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x7f30c27ead40, tinfo=0x7f2facc4ca90 <typeinfo for tvm::runtime::InternalError>,
    dest=0x7f2fa9d4ab40 <tvm::runtime::InternalError::~InternalError()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:76
76        PROBE2 (throw, obj, tinfo);
(gdb) c
Continuing.

Thread 1 "python" hit Catchpoint 2 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x7f30c2c3a5b0, tinfo=0x7f2facc4ca90 <typeinfo for tvm::runtime::InternalError>,
    dest=0x7f2fa9d4ab40 <tvm::runtime::InternalError::~InternalError()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:76
76        PROBE2 (throw, obj, tinfo);
(gdb) c
Continuing.

Thread 1 "python" hit Catchpoint 2 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x7f30c23944e0, tinfo=0x7f2facc4ca90 <typeinfo for tvm::runtime::InternalError>,
    dest=0x7f2fa9d4ab40 <tvm::runtime::InternalError::~InternalError()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:76
76        PROBE2 (throw, obj, tinfo);
(gdb) c
Continuing.

Thread 1 "python" hit Catchpoint 1 (exception caught), __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f30c23944c0) at ../../../../libstdc++-v3/libsupc++/eh_catch.cc:84
84        PROBE2 (catch, objectp, header->exceptionType);
(gdb) c
Continuing.
terminate called after throwing an instance of 'tvm::runtime::InternalError'

Thread 1 "python" hit Catchpoint 3 (exception rethrown), __cxxabiv1::__cxa_rethrow () at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:114
114               PROBE2 (rethrow, __get_object_from_ambiguous_exception(header),
(gdb) c
Continuing.

Thread 1 "python" hit Catchpoint 1 (exception caught), __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f30c23944c0) at ../../../../libstdc++-v3/libsupc++/eh_catch.cc:84
84        PROBE2 (catch, objectp, header->exceptionType);
(gdb) c
Continuing.
  what():  [23:56:01] /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/device_api.h:263: unknown type =0
Stack trace:
  0: tvm::runtime::vm::MemoryManager::GetAllocator(DLDevice)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/memory_manager.cc:154
  1: tvm::runtime::vm::StorageObj::~StorageObj()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/vm/memory_manager.h:126
  2: tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::vm::StorageObj>::Deleter_(tvm::runtime::Object*)
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/memory.h:138
  3: tvm::runtime::Object::DecRef()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:796
  4: tvm::runtime::ObjectPtr<tvm::runtime::vm::StorageObj>::reset()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:442
  5: tvm::runtime::ObjectPtr<tvm::runtime::vm::StorageObj>::~ObjectPtr()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:396
  6: tvm::runtime::vm::VirtualMachine::RunLoop()
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:592
  7: tvm::runtime::vm::VirtualMachine::Invoke(tvm::runtime::vm::VMFunction const&, std::vector<tvm::runtime::ObjectRef, std::allocator<tvm::runtime::ObjectRef> > const&)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:262
  8: tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:132
  9: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316
  10: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
  11: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/packed_func.h:1150
  12: TVMFuncCall
        at /data/ljw_data/PROJECTS/tvm/src/runtime/c_runtime_api.cc:474



Thread 1 "python" received signal SIGABRT, Aborted.
0x00007f30c01c8387 in raise () from /usr/lib64/libc.so.6
(gdb)

And if we make a breakpoint at the constructor of InternalError, we can see CUDA_CALL(cudaMalloc) did throw and its message is complete.

(gdb) list
242         std::ostringstream s;
243         // XXX: Do not change this format, otherwise all error handling in python will break (because it
244         // parses the message to reconstruct the error type).
245         // TODO(tkonolige): Convert errors to Objects, so we can avoid the mess of formatting/parsing
246         // error messages correctly.
247         s << "[" << std::put_time(std::localtime(&time), "%H:%M:%S") << "] " << file << ":" << lineno
248           << ": " << message << std::endl;
249         if (backtrace.size() > 0) {
250           s << backtrace << std::endl;
251         }
(gdb) n
247         s << "[" << std::put_time(std::localtime(&time), "%H:%M:%S") << "] " << file << ":" << lineno
(gdb) n
248           << ": " << message << std::endl;
(gdb) n
248           << ": " << message << std::endl;
(gdb) n
249         if (backtrace.size() > 0) {
(gdb) n
250           s << backtrace << std::endl;
(gdb) n
250           s << backtrace << std::endl;
(gdb) n
252         full_message_ = s.str();
(gdb) n
253       }
(gdb) printf "%s\n", full_message_.c_str()
[00:05:55] /data/ljw_data/PROJECTS/tvm/src/runtime/cuda/cuda_device_api.cc:118:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory
Stack trace:
  0: tvm::runtime::CUDADeviceAPI::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/cuda/cuda_device_api.cc
  1: tvm::runtime::vm::PooledAllocator::Alloc(unsigned long, unsigned long, DLDataType)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:60
  2: tvm::runtime::vm::VirtualMachine::RunLoop()
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:587
  3: tvm::runtime::vm::VirtualMachine::Invoke(tvm::runtime::vm::VMFunction const&, std::vector<tvm::runtime::ObjectRef, std::allocator<tvm::runtime::ObjectRef> > const&)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:262
  4: tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:132
  5: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316
  6: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
  7: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/packed_func.h:1150
  8: TVMFuncCall
        at /data/ljw_data/PROJECTS/tvm/src/runtime/c_runtime_api.cc:474

If we look at how they skipped the exceptions, I found not “__cxa_catch” or “abort” were called after the first 2 exceptions.

(gdb) c
Continuing.

Thread 1 "python" hit Catchpoint 4 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x7fb1a0865170, tinfo=0x7fb088d6fa90 <typeinfo for tvm::runtime::InternalError>,
    dest=0x7fb085e6db40 <tvm::runtime::InternalError::~InternalError()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:76
76        PROBE2 (throw, obj, tinfo);
(gdb) bt
#0  __cxxabiv1::__cxa_throw (obj=0x7fb1a0865170, tinfo=0x7fb088d6fa90 <typeinfo for tvm::runtime::InternalError>, dest=0x7fb085e6db40 <tvm::runtime::InternalError::~InternalError()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:76
#1  0x00007fb085e6d494 in tvm::runtime::detail::LogFatal::Entry::Finalize (this=<optimized out>) at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/logging.h:356
#2  0x00007fb08701f7ea in tvm::runtime::CUDADeviceAPI::AllocDataSpace (this=<optimized out>, dev=..., nbytes=398196736, alignment=<optimized out>, type_hint=...)
    at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/basic_ios.h:158
#3  0x00007fb086fcbf35 in tvm::runtime::vm::PooledAllocator::Alloc (this=0x7fb1a0430410, nbytes=<optimized out>, alignment=64, type_hint=...)
    at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:60
#4  0x00007fb086fd1110 in tvm::runtime::vm::VirtualMachine::RunLoop (this=<optimized out>, this@entry=0x7fb1a04a8660) at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:587
#5  0x00007fb086fd52a8 in tvm::runtime::vm::VirtualMachine::Invoke (this=0x7fb1a04a8660, func=..., args=...) at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:262
#6  tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const (this=<optimized out>, args=..., rv=0x7ffd10aaf908) at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:132
#7  std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) (
    __functor=..., __args=<optimized out>, __args=<optimized out>) at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316
#8  0x00007fb086f78fc6 in std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const (this=0x7fb1a0865170,
    __args=0x7ffd10aaf908, __args=0x7ffd10aaf908) at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
#9  tvm::runtime::PackedFunc::CallPacked (this=0x7fb1a0865170, args=..., rv=0x7ffd10aaf908) at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/packed_func.h:1150
#10 TVMFuncCall (func=0x7fb1a0865170, args=<optimized out>, arg_type_codes=<optimized out>, num_args=-1601783936, ret_val=0x7fb168321790, ret_type_code=0x7fb168321a90)
    at /data/ljw_data/PROJECTS/tvm/src/runtime/c_runtime_api.cc:474
#11 0x00007fb195b629dd in ffi_call_unix64 () from /home/ljw/anaconda3/envs/tvm-keras/lib/python3.8/lib-dynload/../../libffi.so.7
--Type <RET> for more, q to quit, c to continue without paging--Quit
(gdb) n
78        __cxa_eh_globals *globals = __cxa_get_globals ();
(gdb) list
73      __cxxabiv1::__cxa_throw (void *obj, std::type_info *tinfo,
74                               void (_GLIBCXX_CDTOR_CALLABI *dest) (void *))
75      {
76        PROBE2 (throw, obj, tinfo);
77
78        __cxa_eh_globals *globals = __cxa_get_globals ();
79        globals->uncaughtExceptions += 1;
80        // Definitely a primary.
81        __cxa_refcounted_exception *header =
82          __cxa_init_primary_exception(obj, tinfo, dest);
(gdb) n
79        globals->uncaughtExceptions += 1;
(gdb) n
82          __cxa_init_primary_exception(obj, tinfo, dest);
(gdb) n
88        _Unwind_RaiseException (&header->exc.unwindHeader);
(gdb) n
83        header->referenceCount = 1;
(gdb) n
88        _Unwind_RaiseException (&header->exc.unwindHeader);
(gdb) n
0x00007fb085e6d494 in tvm::runtime::detail::LogFatal::Entry::Finalize (this=<optimized out>) at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/logging.h:356
356           throw InternalError(file_, lineno_, stream_.str());
(gdb) n
0x00007fb086fcbf7d in tvm::runtime::vm::PooledAllocator::Alloc (this=0x7fb1a0430410, nbytes=<optimized out>, alignment=64, type_hint=...)
    at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_mutex.h:107
107             __throw_system_error(__e);
(gdb) list
102         {
103           int __e = __gthread_mutex_lock(&_M_mutex);
104
105           // EINVAL, EAGAIN, EBUSY, EINVAL, EDEADLK(may)
106           if (__e)
107             __throw_system_error(__e);
108         }
109
110         bool
111         try_lock() noexcept
(gdb) n
64        }
(gdb) n
0x00007fb086fd1dca in tvm::runtime::vm::VirtualMachine::RunLoop (this=<optimized out>, this@entry=0x7fb1a04a8660) at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:420
420             throw std::runtime_error("VM encountered fatal error");
(gdb) list
415             WriteRegister(instr.dst, from_obj);
416             pc_++;
417             goto main_loop;
418           }
419           case Opcode::Fatal: {
420             throw std::runtime_error("VM encountered fatal error");
421           }
422           case Opcode::LoadConst: {
423             auto constant_obj = exec_->constants[instr.const_index];
424             // We cache the allocated object in the constant pool. To measure, the
(gdb) n
592           }
(gdb) c
Continuing.

Thread 1 "python" hit Catchpoint 4 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x7fb1a0414c20, tinfo=0x7fb088d6fa90 <typeinfo for tvm::runtime::InternalError>,
    dest=0x7fb085e6db40 <tvm::runtime::InternalError::~InternalError()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:76
76        PROBE2 (throw, obj, tinfo);
(gdb) n
78        __cxa_eh_globals *globals = __cxa_get_globals ();
(gdb) c
Continuing.

Thread 1 "python" hit Catchpoint 4 (exception thrown), __cxxabiv1::__cxa_throw (obj=0x7fb1a02c9b80, tinfo=0x7fb088d6fa90 <typeinfo for tvm::runtime::InternalError>,
    dest=0x7fb085e6db40 <tvm::runtime::InternalError::~InternalError()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:76
76        PROBE2 (throw, obj, tinfo);
(gdb) n
78        __cxa_eh_globals *globals = __cxa_get_globals ();
(gdb) n
79        globals->uncaughtExceptions += 1;
(gdb) n
82          __cxa_init_primary_exception(obj, tinfo, dest);
(gdb) n
88        _Unwind_RaiseException (&header->exc.unwindHeader);
(gdb) n
83        header->referenceCount = 1;
(gdb) n
88        _Unwind_RaiseException (&header->exc.unwindHeader);
(gdb) n
0x00007fb085e6d494 in tvm::runtime::detail::LogFatal::Entry::Finalize (this=<optimized out>) at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/logging.h:356
356           throw InternalError(file_, lineno_, stream_.str());
(gdb) n
0x00007fb086fcb831 in tvm::runtime::vm::MemoryManager::GetAllocator (dev=...) at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/memory_manager.cc:154
154         LOG(FATAL) << "Allocator for " << DeviceName(dev.device_type) << "(" << dev.device_id
(gdb) n

Thread 1 "python" hit Catchpoint 2 (exception caught), __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7fb1a02c9b60) at ../../../../libstdc++-v3/libsupc++/eh_catch.cc:84
84        PROBE2 (catch, objectp, header->exceptionType);
(gdb) n
90      }
(gdb) n
terminate called after throwing an instance of 'tvm::runtime::InternalError'

Thread 1 "python" hit Catchpoint 5 (exception rethrown), __cxxabiv1::__cxa_rethrow () at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:114
114               PROBE2 (rethrow, __get_object_from_ambiguous_exception(header),
(gdb) n
124           _Unwind_Resume_or_Rethrow (&header->unwindHeader);
(gdb) n
0x00007fb07ff28475 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
95          abort();
(gdb) n
82              __catch(const exception& exc)
(gdb) n

Thread 1 "python" hit Catchpoint 2 (exception caught), __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7fb1a02c9b60) at ../../../../libstdc++-v3/libsupc++/eh_catch.cc:84
84        PROBE2 (catch, objectp, header->exceptionType);
(gdb) n
90      }
(gdb) n
__gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:84
(gdb) c
Continuing.
[00:29:57] /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/device_api.h:263: unknown type =0
Stack trace:
  0: tvm::runtime::vm::MemoryManager::GetAllocator(DLDevice)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/memory_manager.cc:154
  1: tvm::runtime::vm::StorageObj::~StorageObj()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/vm/memory_manager.h:126
  2: tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::vm::StorageObj>::Deleter_(tvm::runtime::Object*)
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/memory.h:138
  3: tvm::runtime::Object::DecRef()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:796
  4: tvm::runtime::ObjectPtr<tvm::runtime::vm::StorageObj>::reset()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:442
  5: tvm::runtime::ObjectPtr<tvm::runtime::vm::StorageObj>::~ObjectPtr()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:396
  6: tvm::runtime::vm::VirtualMachine::RunLoop()
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:592
  7: tvm::runtime::vm::VirtualMachine::Invoke(tvm::runtime::vm::VMFunction const&, std::vector<tvm::runtime::ObjectRef, std::allocator<tvm::runtime::ObjectRef> > const&)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:262
  8: tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:132
  9: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316
  10: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
  11: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/packed_func.h:1150
  12: TVMFuncCall
        at /data/ljw_data/PROJECTS/tvm/src/runtime/c_runtime_api.cc:474

Discussion

Some dirty fixes

Marking the destructor noexcept(true) can help the first exception not being ignored. (but this is a bit ugly…

Throw in destructor?

According to ISO C++ WIKI and many others, it is suggested to avoid “throw” in destructor, since things becomes tricky when dealing with stack unwinding.

Other libraries, though supported “throw while logging” functionality, avoids throw anything in the destructor (BOOST_LOG_THROW) or at least made it optional (GLOG’s LogMessageFatal).

Disabling such pattern in TVM’s logger might make the codes more robust, but it costs simplicity (it is cool to write ICHECK(...) like C++ streams, while BOOST_LOG_THROW writes like BOOST_LOG_THROW(lg, fatal, boost::exception, "Error: " << dump);).

ganler · June 19, 2021, 6:44am

The InternelError from “OOM” should let the program exit nicely. However, in this case, it exits with a segmentation fault.

And then I looked into PooledAllocator (the primary tracked error occurred in PooledAllocator::Alloc):

PooledAllocator does not release memory until its destructor is called.

I found this bug can be fixed if we let PooledAllocator releases all unused memory when OOM is found and tries to reallocate.

In the Colab example, the program logic is to do inference on a tensor whose batch size increases from 1 to N (old tensors are “released” but actually cached in PooledAllocator as many small memory fragments). Due to PooledAllocator's design, we may allocate batch_size=N but may not allocate something like “from 1~N” since the memory fragments will not make space for upcoming bigger allocation.

Therefore, I am wondering if you guys think it is a good idea to "ReleaseAll() and then re-Alloc(...)". Doing so can help 1) fix a weird and general bug and 2) make TVM more robust.

Last, here’s the logging when making the change I described above, with no segmentation fault and wrongly located logging. And the capable batch size is greater (i.e., 57 v.s. 27)

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32, workload=('conv2d_nchw.cuda', ('TENSOR', (any_dim, 3, 512, 512), 'float32'), ('TENSOR', (32, 3, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Running batch size = 0
Running batch size = 1
Running batch size = 2
Running batch size = 3
...
Running batch size = 26
Running batch size = 27
[14:53:34] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:63: Warning: PooledAllocator got InternalError during allocation: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory
[14:53:34] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:64: Warning: Trying to release all unused memory and reallocate...
Running batch size = 28
Running batch size = 29
Running batch size = 30
Running batch size = 31
Running batch size = 32
Running batch size = 33
Running batch size = 34
Running batch size = 35
Running batch size = 36
Running batch size = 37
[14:53:35] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:63: Warning: PooledAllocator got InternalError during allocation: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory
[14:53:35] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:64: Warning: Trying to release all unused memory and reallocate...
Running batch size = 38
Running batch size = 39
Running batch size = 40
Running batch size = 41
Running batch size = 42
Running batch size = 43
Running batch size = 44
Running batch size = 45
[14:53:36] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:63: Warning: PooledAllocator got InternalError during allocation: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory
[14:53:36] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:64: Warning: Trying to release all unused memory and reallocate...
Running batch size = 46
Running batch size = 47
Running batch size = 48
Running batch size = 49
Running batch size = 50
Running batch size = 51
[14:53:37] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:63: Warning: PooledAllocator got InternalError during allocation: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory
[14:53:37] /data/ljw_data/PROJECTS/tvm/src/runtime/vm/pooled_allocator.h:64: Warning: Trying to release all unused memory and reallocate...
Running batch size = 52
Running batch size = 53
Running batch size = 54
Running batch size = 55
Running batch size = 56
Running batch size = 57
Traceback (most recent call last):
  File "../oom.py", line 36, in <module>
    tvm_out = executor.evaluate()(tvm.nd.array(data.astype("float32")), **params)
  File "/home/ljw/data/PROJECTS/tvm/python/tvm/relay/backend/vm.py", line 286, in _vm_wrapper
    return self.vm.run(*args)
  File "/home/ljw/data/PROJECTS/tvm/python/tvm/runtime/vm.py", line 461, in run
    return self.invoke("main", *args, **kwargs)
  File "/home/ljw/data/PROJECTS/tvm/python/tvm/runtime/vm.py", line 442, in invoke
    self.set_input(func_name, *args, **kwargs)
  File "/home/ljw/data/PROJECTS/tvm/python/tvm/runtime/vm.py", line 420, in set_input
    self._set_input(func_name, *cargs)
  File "/home/ljw/data/PROJECTS/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  9: TVMFuncCall
        at /data/ljw_data/PROJECTS/tvm/src/runtime/c_runtime_api.cc:474
  8: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/packed_func.h:1151
  7: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
  6: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_5>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316
  5: tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_5::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:207
  4: tvm::runtime::vm::CopyTo(tvm::runtime::ObjectRef, DLDevice const&)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:71
  3: tvm::runtime::NDArray::CopyTo(DLDevice const&) const
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/ndarray.h:360
  2: tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optional<tvm::runtime::String>)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/ndarray.cc:197
  1: tvm::runtime::DeviceAPI::AllocDataSpace(DLDevice, int, long const*, DLDataType, tvm::runtime::Optional<tvm::runtime::String>)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/c_runtime_api.cc:167
  0: tvm::runtime::CUDADeviceAPI::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/cuda/cuda_device_api.cc
  File "/data/ljw_data/PROJECTS/tvm/src/runtime/cuda/cuda_device_api.cc", line 118
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory

And if we don’t apply the change (on the same environment), the logging will look likes this:

segmentation fault (exception not handled in Python-end).
the error is located in somewhere else.

> INIT:: # BB in this DSO: 1077184; # BB total: 1077184
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32, workload=('conv2d_nchw.cuda', ('TENSOR', (any_dim, 3, 512, 512), 'float32'), ('TENSOR', (32, 3, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Running batch size = 0
Running batch size = 1
Running batch size = 2
Running batch size = 3
...
Running batch size = 26
Running batch size = 27
terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [14:56:30] /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/device_api.h:263: unknown type =0
Stack trace:
  0: tvm::runtime::vm::MemoryManager::GetAllocator(DLDevice)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/memory_manager.cc:154
  1: tvm::runtime::vm::StorageObj::~StorageObj()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/vm/memory_manager.h:126
  2: tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::vm::StorageObj>::Deleter_(tvm::runtime::Object*)
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/memory.h:138
  3: tvm::runtime::Object::DecRef()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:798
  4: tvm::runtime::ObjectPtr<tvm::runtime::vm::StorageObj>::reset()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:444
  5: tvm::runtime::ObjectPtr<tvm::runtime::vm::StorageObj>::~ObjectPtr()
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/object.h:398
  6: tvm::runtime::vm::VirtualMachine::RunLoop()
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:592
  7: tvm::runtime::vm::VirtualMachine::Invoke(tvm::runtime::vm::VMFunction const&, std::vector<tvm::runtime::ObjectRef, std::allocator<tvm::runtime::ObjectRef> > const&)
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:262
  8: tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/src/runtime/vm/vm.cc:132
  9: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316
  10: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/ljw/bin/../lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706
  11: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /data/ljw_data/PROJECTS/tvm/include/tvm/runtime/packed_func.h:1151
  12: TVMFuncCall
        at /data/ljw_data/PROJECTS/tvm/src/runtime/c_runtime_api.cc:474


[1]    18803 abort (core dumped)  python ../oom.py

cc @junrushao

yuchenj · June 19, 2021, 4:15pm

Hi @ganler, thank you for digging into this issue deeply and writing this up!

I also found that ReleaseAll() is never getting called in the relay vm since the destructor ~PooledAllocator() is never getting called, but I did not expect it would cause the OOM error being skipped and report some weird unknown device type error. Fixing it also helps when we have dynamic tensor shapes. Is your idea to call ReleaseAll() when the OOM error is thrown?

ganler · June 19, 2021, 5:06pm

Thank you for your reply! @yuchenj Yes, ReleaseAll() when OOM can magically fix this bug (and it also helps resolve memory fragments in the PooledAllocator). The detailed code change can be found here: [VM][PooledAllocator] try reallocation once when OOM by ganler · Pull Request #8285 · apache/tvm · GitHub

I tried my best to find out the real reason:

gdb says OOM in Alloc did throw, but nobody catches it, and std::terminate was strangely not called regarding the exception. i.e., the exception strangely disappeared.
Then I began to wonder if it has sth to do with broken memory, but valgrind did not find me anything useful.

Actually, I found this issue when I was testing if TVM works with dynamic batch size, starting from 1 to N. I first found it gave me strange output information, and then I also realized that the PooledAllocator does not release memory fragments or compact them, making “small-to-big” allocations very limited…

yuchenj · June 19, 2021, 7:07pm

Thank you @ganler for all the details! Your pr looks good to me!

junrushao · June 21, 2021, 5:25pm

CC: @haichen @jroesch @zhiics