[AutoTVM, Auto scheduler] Always use the VM compiler for task extraction

masahi · September 22, 2021, 12:41am

Hi, during the task extraction of auto scheduler, I got the following segfault from GraphPlanMemory when the model involves control flow and the input is large such as (8, 3, 512, 512). This cannot be caught by python try/catch, so I had to force use the VM compiler for task extraction.

#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:262
#1  0x00007fffd6566d96 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forw
ard_iterator_tag) () from /home/masa/projects/dev/tvm/build/libtvm.so
#2  0x00007fffd769a144 in tvm::runtime::(anonymous namespace)::BacktraceSyminfoCallback(void*, unsigned long, char const*, unsigned long, unsigned long) ()
   from /home/masa/projects/dev/tvm/build/libtvm.so
#3  0x00007fffd77e8b55 in backtrace_syminfo () from /home/masa/projects/dev/tvm/build/libtvm.so
#4  0x00007fffd769b2ea in tvm::runtime::(anonymous namespace)::BacktraceFullCallback(void*, unsigned long, char const*, int, char const*) ()
   from /home/masa/projects/dev/tvm/build/libtvm.so
#5  0x00007fffd77f2f47 in dwarf_fileline () from /home/masa/projects/dev/tvm/build/libtvm.so
#6  0x00007fffd77e8df6 in unwind () from /home/masa/projects/dev/tvm/build/libtvm.so
#7  0x00007ffff3afc11c in _Unwind_Backtrace (trace=0x7fffd77e8d50 <unwind>, trace_argument=0x7fff2517cbd0)
    at /home/builder/ktietz/cos6/ci_cos6/ctng-compilers_1622658800915/work/.build/x86_64-conda-linux-gnu/src/gcc/libgcc/unwind.inc:307
#8  0x00007fffd77e8e7d in backtrace_full () from /home/masa/projects/dev/tvm/build/libtvm.so
#9  0x00007fffd769a7eb in tvm::runtime::Backtrace[abi:cxx11]() () from /home/masa/projects/dev/tvm/build/libtvm.so
#10 0x00007fffd62c329b in tvm::runtime::detail::LogFatal::Entry::Finalize() () from /home/masa/projects/dev/tvm/build/libtvm.so
#11 0x00007fffd746c58f in tvm::relay::StorageAllocaBaseVisitor::VisitExpr_(tvm::relay::IfNode const*) () from /home/masa/projects/dev/tvm/build/libtvm.so
#12 0x00007fffd7530238 in tvm::relay::ExprVisitor::VisitExpr(tvm::RelayExpr const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#13 0x00007fffd7470456 in tvm::relay::StorageAllocaInit::VisitExpr_(tvm::relay::CallNode const*) () from /home/masa/projects/dev/tvm/build/libtvm.so
#14 0x00007fffd7530238 in tvm::relay::ExprVisitor::VisitExpr(tvm::RelayExpr const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#15 0x00007fffd7470456 in tvm::relay::StorageAllocaInit::VisitExpr_(tvm::relay::CallNode const*) () from /home/masa/projects/dev/tvm/build/libtvm.so
#16 0x00007fffd7530238 in tvm::relay::ExprVisitor::VisitExpr(tvm::RelayExpr const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#17 0x00007fffd7472791 in tvm::relay::StorageAllocaBaseVisitor::VisitExpr_(tvm::relay::TupleNode const*) () from /home/masa/projects/dev/tvm/build/libtvm.so
#18 0x00007fffd7530238 in tvm::relay::ExprVisitor::VisitExpr(tvm::RelayExpr const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#19 0x00007fffd746c5d8 in tvm::relay::StorageAllocaBaseVisitor::GetToken(tvm::RelayExpr const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#20 0x00007fffd7472651 in tvm::relay::StorageAllocaBaseVisitor::VisitExpr_(tvm::relay::LetNode const*) () from /home/masa/projects/dev/tvm/build/libtvm.so
#21 0x00007fffd7530238 in tvm::relay::ExprVisitor::VisitExpr(tvm::RelayExpr const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#22 0x00007fffd746c5d8 in tvm::relay::StorageAllocaBaseVisitor::GetToken(tvm::RelayExpr const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#23 0x00007fffd7470954 in tvm::relay::StorageAllocator::Plan(tvm::relay::Function const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#24 0x00007fffd746b153 in tvm::relay::GraphPlanMemory(tvm::relay::Function const&) () from /home/masa/projects/dev/tvm/build/libtvm.so
#25 0x00007fffd746819e in tvm::relay::backend::GraphExecutorCodegen::Codegen(tvm::relay::Function, tvm::runtime::String) () from /home/masa/projects/dev/tvm/build/libtvm.so
#26 0x00007fffd746a3e5 in std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::GraphExecutorCodegenModule::GetFunction(std::
__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm:
:runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) () from /home/masa/projects/dev/tvm/build/libtvm.so

When tuning detection models such as MaskRCNN or yolo, I know that using the graph codegen for task extraction would fail, so the time spent on partially compiling the model is wasted. Now that I realized that there is a possibility for segfault, I really want to use only the VM compiler for task extraction, regardless of presence of control flow or dynamic shapes.

I found a PR https://github.com/apache/tvm/pull/5019 that changed the default task extraction compiler to be the graph codegen. Can we revisit this decision? Does this “stack overflow problem” still stand today, and if so, is there a reproducible test case?

@comaniac @haichen

comaniac · September 22, 2021, 1:12am

Ah it’s really a long time ago…I don’t remember which model we used unfortunately. @haichen could you remember?

By looking at the PR discussion, if the problem was due to recursive graph traversing, then we should be good now as most passes were refactored to iterative fashion.

masahi · September 22, 2021, 1:31am

ok great, sent a PR https://github.com/apache/tvm/pull/9069