mike
October 28, 2019, 10:37am
1
Good morning,
Following the VTA MxNet tutorial, I get a segmentation fault.
The segmentation fault is produced during the execution of GraphRuntime::Run. More specifically, the loop that executes op_execs_[i]. The first three (of 100) are not executed and the fourth starts executing and generates a segmentation fault.
Could anyone help me with this? Os throw any light as why this is happening?
Thank you very much for your time.
Error trace:
Segmentation fault: 11
Segmentation fault: 11
Segmentation fault: 11
Stack trace:
[bt] (0) /home/mike/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b64150) [0x7eff5ff51150]
[bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7eff7c2924b0]
[bt] (2) /tmp/tmptxfa_dss/graphlib.o.so(+0x174bf) [0x7eff7019e4bf]
[bt] (3) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::ThreadPool::Launch(int ()(int, TVMParallelGroupEnv , void*), void*, int, int)+0xfd1) [0x7eff31d8d621]
[bt] (4) /scratch/mike/tvm/build/libtvm.so(TVMBackendParallelLaunch+0x63) [0x7eff31d8af03]
[bt] (5) /tmp/tmptxfa_dss/graphlib.o.so(+0x170eb) [0x7eff7019e0eb]
[bt] (6) /tmp/tmptxfa_dss/graphlib.o.so(fused_nn_conv2d_add_nn_relu+0x3bd) [0x7eff7019dcad]
[bt] (7) /scratch/mike/tvm/build/libtvm.so(+0xbea331) [0x7eff31d84331]
[bt] (8) /scratch/mike/tvm/build/libtvm.so(+0xc3f6a7) [0x7eff31dd96a7]
Stack trace:
[bt] (0) /home/mike/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b64150) [0x7eff5ff51150]
[bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7eff7c2924b0]
[bt] (2) /tmp/tmptxfa_dss/graphlib.o.so(+0x174bf) [0x7eff7019e4bf]
[bt] (3) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::ThreadPool::RunWorker(int)+0x1b9) [0x7eff31d8ba19]
[bt] (4) /scratch/mike/tvm/build/libtvm.so(std: :_Impl<std::_Bind_simple<tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::function<void (int)>, bool)::{lambda()#1 } ()> >::M_run()+0x31) [0x7eff31d812e1]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7eff755cfc80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7eff7c62e6ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7eff7c36441d]
terminate called after throwing an instance of ‘dmlc::Error’
what(): [11:27:34] /scratch/mike/tvm/src/runtime/workspace_pool.cc:116: Check failed: allocated .size() == 1 (3 vs. 1) :
Stack trace:
[bt] (0) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::Pool::Release(DLContext, tvm::runtime::DeviceAPI*)+0x652) [0x7eff31d9a1d2]
[bt] (1) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::WorkspacePool::~WorkspacePool()+0x3f) [0x7eff31d9887f]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(__call_tls_dtors+0x3f) [0x7eff7c2975ff]
[bt] (3) /lib/x86_64-linux-gnu/libc.so.6(+0x39f27) [0x7eff7c296f27]
[bt] (4) /lib/x86_64-linux-gnu/libc.so.6(+0x3a045) [0x7eff7c297045]
[bt] (5) /home/mike/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b64188) [0x7eff5ff51188]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7eff7c2924b0]
[bt] (7) /tmp/tmptxfa_dss/graphlib.o.so(+0x174bf) [0x7eff7019e4bf]
[bt] (8) /scratch/mike/tvm/build/libtvm.so(tvm::runtime::ThreadPool::Launch(int ()(int, TVMParallelGroupEnv , void*), void*, int, int)+0xfd1) [0x7eff31d8d621]
1 Like
what are you running this on? the Pynq FPGA?
I recently came across a similar problem when I go through the VTA MxNet tutorial, is any update of this problem:
Process: Python [18397]
Path: /Library/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python
Identifier: Python
Version: 3.8.0 (3.8.0)
Code Type: X86-64 (Native)
Parent Process: zsh [11738]
Responsible: iTerm2 [1025]
User ID: 501
Date/Time: 2020-08-13 22:10:24.361 +0800
OS Version: Mac OS X 10.15.6 (19G2021)
Report Version: 12
Bridge OS Version: 4.6 (17P6610)
Anonymous UUID: 5157B874-0DFA-2106-FC9E-40E18F1CCB5E
Time Awake Since Boot: 27000 seconds
System Integrity Protection: enabled
Crashed Thread: 5
Exception Type: EXC_BAD_ACCESS (SIGABRT)
Exception Codes: EXC_I386_GPFLT
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
abort() called
Python(18397,0x700004953000) malloc: Incorrect checksum for freed object 0x7f92263fff70: probably modified after being freed.
Corrupt value: 0x6572756c69614620
Error log:
Reconfigured FPGA and RPC runtime in 2.95s!
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
"-target" is deprecated, use "-mtriple" instead.
Segmentation fault: 11
Segmentation fault: 11
Segmentation fault: 11
Segmentation fault: 11
Stack trace:
[bt] (0) 1 libmxnet.so 0x000000011a1c6120 mxnet::Storage::Get() + 11632
[bt] (1) 2 libsystem_platform.dylib 0x00007fff6ebc05fd _sigtramp + 29
[bt] (2) 3 ??? 0x0000000000003ffb 0x0 + 16379
[bt] (3) 4 ??? 0x00000001371b1494 0x0 + 5219488916
[bt] (4) 5 libtvm.dylib 0x0000000132184d4c tvm::relay::Interpreter::InvokePrimitiveOp(tvm::relay::Function const&, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&) + 3516
[bt] (5) 6 libtvm.dylib 0x000000013218305b tvm::relay::Interpreter::Invoke(tvm::relay::InterpreterClosure const&, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::relay::Var const&) + 171
[bt] (6) 7 libtvm.dylib 0x000000013217e291 tvm::relay::Interpreter::VisitExpr_(tvm::relay::CallNode const*) + 961
[bt] (7) 8 libtvm.dylib 0x0000000132181f08 tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>::InitVTable()::'lambda4'(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*)::__invoke(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*) + 24
[bt] (8) 9 libtvm.dylib 0x00000001321808bf tvm::NodeFunctor<tvm::runtime::ObjectRef (tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*)>::operator()(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::ObjectRef (tvm::RelayExpr const&)>*) const + 255
Stack trace:
[bt] (0) 1 libmxnet.so 0x000000011a1c6120 mxnet::Storage::Get() + 11632
[bt] (1) 2 libsystem_platform.dylib 0x00007fff6ebc05fd _sigtramp + 29
[bt] (2) 3 ??? 0x0000000000000000 0x0 + 0
[bt] (3) 4 libtvm.dylib 0x00000001322d6d0e void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::__1::function<void (int)>, bool)::'lambda'()> >(void*) + 62
[bt] (4) 5 libsystem_pthread.dylib 0x00007fff6ebcc109 _pthread_start + 148
[bt] (5) 6 libsystem_pthread.dylib 0x00007fff6ebc7b8b thread_start + 15
Stack trace:
[bt] (0) 1 libmxnet.so 0x000000011a1c6120 mxnet::Storage::Get() + 11632
[bt] (1) 2 libsystem_platform.dylib 0x00007fff6ebc05fd _sigtramp + 29
[bt] (2) 3 ??? 0x0000000000000000 0x0 + 0
[bt] (3) 4 libtvm.dylib 0x00000001322d6d0e void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::__1::function<void (int)>, bool)::'lambda'()> >(void*) + 62
[bt] (4) 5 libsystem_pthread.dylib 0x00007fff6ebcc109 _pthread_start + 148
[bt] (5) 6 libsystem_pthread.dylib 0x00007fff6ebc7b8b thread_start + 15
Stack trace:
[bt] (0) 1 libmxnet.so 0x000000011a1c6120 mxnet::Storage::Get() + 11632
[bt] (1) 2 libsystem_platform.dylib 0x00007fff6ebc05fd _sigtramp + 29
[bt] (2) 3 ??? 0x0000000000000000 0x0 + 0
[bt] (3) 4 libtvm.dylib 0x00000001322d6d0e void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, tvm::runtime::threading::ThreadGroup::Impl::Impl(int, std::__1::function<void (int)>, bool)::'lambda'()> >(void*) + 62
[bt] (4) 5 libsystem_pthread.dylib 0x00007fff6ebcc109 _pthread_start + 148
[bt] (5) 6 libsystem_pthread.dylib 0x00007fff6ebc7b8b thread_start + 15
Python(18397,0x700004953000) malloc: Incorrect checksum for freed object 0x7f92263fff70: probably modified after being freed.
Corrupt value: 0x6572756c69614620
Python(18397,0x700004953000) malloc: *** set a breakpoint in malloc_error_break to debug
[1] 18397 abort python3.8 ./deploy_classification.py
thierry
September 1, 2020, 4:40am
4
Thanks for reporting the issue @Groupsun , I was able to reproduce this issue, and I’m investigating the bug.
thierry
September 1, 2020, 5:18am
5
I found that this PR has introduced the faulty behavior: https://github.com/apache/incubator-tvm/pull/6195
I’ll need to investigate more tomorrow/wednesday. In the meantime if you need to get the example working you’ll need to revert to d892881c4cc8c9a29bc03233aeac2b1532a9c6891
Julien
September 15, 2020, 6:42pm
6
Any update about this issue @thierry ?
thierry
September 15, 2020, 7:04pm
7
Yes, the bug was fixed, but I forgot to update the thread. See: https://github.com/apache/incubator-tvm/pull/6377
Julien
September 18, 2020, 8:38pm
8
Thank you @thierry !
But when I try it (commit 693c0de, or latest commit) I fall into another error when I try to launch the RPC server (target side):
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.6/runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "/home/xilinx/tvm/vta/python/vta/__init__.py", line 27, in <module>
from .bitstream import get_bitstream_path, download_bitstream
File "/home/xilinx/tvm/vta/python/vta/bitstream.py", line 23, in <module>
from tvm.contrib.download import download
File "/home/xilinx/tvm/python/tvm/__init__.py", line 64, in <module>
from . import hybrid
File "/home/xilinx/tvm/python/tvm/hybrid/__init__.py", line 19, in <module>
from .utils import create_module, ashybrid, script
File "/home/xilinx/tvm/python/tvm/hybrid/utils.py", line 23, in <module>
from .parser import from_source
File "/home/xilinx/tvm/python/tvm/hybrid/parser.py", line 24, in <module>
from typed_ast import ast3 as ast
ModuleNotFoundError: No module named 'typed_ast'
Do you have any idea?
thierry
September 18, 2020, 8:50pm
9
ah yes, you’ll need to install typed_ast with pip install typed_ast
Julien
September 22, 2020, 9:42pm
10
Thank you @thierry ! But I have a lot of difficulties to install typed_ast on Ultra96 board. Especially since Pynq uses Python 2.7.