Auto-tune finished, but Build error occurs for my own onnx model

kindlehe · April 17, 2020, 8:25am

When I auto-tune my own onnx model, it finished:

[Task 20/22]  Current/Best:    3.86/  14.62 GFLOPS | Progress: (5/5) | 4.90 s Done.
[Task 21/22]  Current/Best:    7.47/  12.78 GFLOPS | Progress: (5/5) | 2.42 s Done.
[Task 22/22]  Current/Best:    2.07/   2.07 GFLOPS | Progress: (5/5) | 2.55 s Done.
Auto-tvm on
Compile...

But, build error occurs after going into relay.build_config

with autotvm.apply_history_best(log_file):
print("Compile...")

        with relay.build_config(opt_level=3):
            graph, lib, params = relay.build(
                mod, target=target, params=params)

With the error log:

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) 9   libtvm.dylib                        0x0000000110a576f0 tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const + 992
  [bt] (7) 8   libtvm.dylib                        0x0000000110a57b03 tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const + 179
  [bt] (6) 7   libtvm.dylib                        0x000000011123fee8 tvm::relay::transform::FunctionPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const + 1624
  [bt] (5) 6   libtvm.dylib                        0x0000000110a36e30 tvm::IRModuleNode::Add(tvm::GlobalVar const&, tvm::BaseFunc const&, bool) + 320
  [bt] (4) 5   libtvm.dylib                        0x0000000110a362c7 tvm::RunTypeCheck(tvm::IRModule const&, tvm::GlobalVar const&, tvm::relay::Function) + 1431
  [bt] (3) 4   libtvm.dylib                        0x0000000111189535 tvm::relay::InferType(tvm::relay::Function const&, tvm::IRModule const&, tvm::GlobalVar const&) + 565
  [bt] (2) 3   libtvm.dylib                        0x00000001111886d8 tvm::relay::TypeInferencer::Infer(tvm::RelayExpr) + 136
  [bt] (1) 2   libtvm.dylib                        0x0000000110a2a8db tvm::ErrorReporter::RenderErrors(tvm::IRModule const&, bool) + 5499
  [bt] (0) 1   libtvm.dylib                        0x000000011091e899 dmlc::LogMessageFatal::~LogMessageFatal() + 57
  [bt] (8) 9   libtvm.dylib                        0x0000000110a362c7 tvm::RunTypeCheck(tvm::IRModule const&, tvm::GlobalVar const&, tvm::relay::Function) + 1431
  [bt] (7) 8   libtvm.dylib                        0x0000000111189535 tvm::relay::InferType(tvm::relay::Function const&, tvm::IRModule const&, tvm::GlobalVar const&) + 565
  [bt] (6) 7   libtvm.dylib                        0x00000001111886bc tvm::relay::TypeInferencer::Infer(tvm::RelayExpr) + 108
  [bt] (5) 6   libtvm.dylib                        0x0000000111063548 tvm::relay::TypeSolver::Solve() + 1064
  [bt] (4) 5   libtvm.dylib                        0x0000000111063be5 tvm::TypedEnvFunc<bool (tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>::operator()(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&) const + 325
  [bt] (3) 4   libtvm.dylib                        0x0000000110d7d83b std::__1::__function::__func<void tvm::runtime::TypedPackedFunc<bool (tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>(bool (*)(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&))::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*), std::__1::allocator<void tvm::runtime::TypedPackedFunc<bool (tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>::AssignTypedLambda<bool (*)(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>(bool (*)(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&))::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 107
  [bt] (2) 3   libtvm.dylib                        0x0000000110d7d9c3 void tvm::runtime::detail::unpack_call_dispatcher<bool, 0, 4, bool (*)(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&)>::run<tvm::runtime::TVMMovableArgValue_, tvm::runtime::TVMMovableArgValue_, tvm::runtime::TVMMovableArgValue_, tvm::runtime::TVMMovableArgValue_>(bool (* const&)(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&), tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*, tvm::runtime::TVMMovableArgValue_&&, tvm::runtime::TVMMovableArgValue_&&, tvm::runtime::TVMMovableArgValue_&&, tvm::runtime::TVMMovableArgValue_&&) + 323
  [bt] (1) 2   libtvm.dylib                        0x0000000110de508c bool tvm::relay::Conv2DWinogradRel<tvm::relay::Conv2DWinogradAttrs>(tvm::Array<tvm::Type, void> const&, int, tvm::Attrs const&, tvm::TypeReporter const&) + 1756
  [bt] (0) 1   libtvm.dylib                        0x000000011091e899 dmlc::LogMessageFatal::~LogMessageFatal() + 57
  File "/Users/kindlehe/Project/tvm/tvm/src/ir/error.cc", line 133
TVMError: 
Error(s) have occurred. The program has been annotated with them:

In `main`: 
v0.0.4
fn (%input: Tensor[(1, 3, 112, 112), float32]) -> Tensor[(1, 2), float32] {
  %0 = reshape(%input, newshape=[-1, 3, 112, 112]);
  %1 = layout_transform(meta[relay.Constant][0], src_layout="OIHW", dst_layout="OIHW2o");
  %2 = nn.conv2d(%0, %1, strides=[2, 2], padding=[0, 0, 0, 0], kernel_size=[3, 3], kernel_layout="OIHW2o");
  %3 = expand_dims(meta[relay.Constant][1], axis=1, num_newaxis=2);
  %4 = expand_dims(%3, axis=0);
  %5 = add(%2, %4);
  %6 = nn.relu(%5);
  %7 = nn.max_pool2d(%6, pool_size=[3, 3], strides=[2, 2], padding=[0, 0, 0, 0], ceil_mode=True);
  %8 = layout_transform(meta[relay.Constant][2], src_layout="OIHW", dst_layout="OIHW2o");
  %9 = nn.conv2d(%7, %8, padding=[0, 0, 0, 0], kernel_size=[1, 1], kernel_layout="OIHW2o");
  %10 = expand_dims(meta[relay.Constant][3], axis=1, num_newaxis=2);
  %11 = expand_dims(%10, axis=0);
  %12 = add(%9, %11);
  %13 = nn.relu(%12);
  %14 = layout_transform(meta[relay.Constant][4], src_layout="OIHW", dst_layout="OIHW8o");
  %15 = nn.conv2d(%13, %14, padding=[0, 0, 0, 0], kernel_size=[1, 1], kernel_layout="OIHW8o");
  %16 = expand_dims(meta[relay.Constant][5], axis=1, num_newaxis=2);
  %17 = expand_dims(%16, axis=0);
  %18 = add(%15, %17);
  %19 = nn.relu(%18);
  %20 = nn.contrib_conv2d_winograd_weight_transform(meta[relay.Constant][6], tile_size=4);
  %21 = reshape(%20, newshape=[6, 6, 16, 4, 16]);
  %22 = transpose(%21, axes=[0, 1, 2, 4, 3]);
  %23 = nn.contrib_conv2d_winograd_without_weight_transform(%13, %22, tile_size=4, padding=[1, 1, 1, 1], kernel_size=[3, 3]) an internal invariant was violated while typechecking your program [14:45:38] /Users/user_name/Project/tvm/tvm/src/relay/op/nn/convolution.h:489: Check failed: param->kernel_size.defined() && param->channels.defined(): The kernel size and channels of a Conv must be set or inferred by previous pass
; ;
  %24 = expand_dims(meta[relay.Constant][7], axis=1, num_newaxis=2);

main error line:

an internal invariant was violated while typechecking your program [14:45:38] /Users/user_name/Project/tvm/tvm/src/relay/op/nn/convolution.h:489: Check failed: param->kernel_size.defined() && param->channels.defined(): The kernel size and channels of a Conv must be set or inferred by previous pass
; ;

@anijain2305 could you please give me some help

kindlehe · April 17, 2020, 9:03am

I guess the error maybe caused by model load difference betweenthe onnx and mxnet model, focusing on relay.Function and tvm.IRModule.from_expr,

1. I am not sure what do they used for, and what should I write for onnx model?

def customed_network_from_onnx(model_path, input_shapes, dtype="float32"):
    import onnx
    onnx_model = onnx.load(model_path)
    mod, params = relay.frontend.from_onnx(onnx_model, input_shapes, dtype=dtype)
    return mod, params 

def get_network(name, batch_size, input_name=None, input_size=None):
...
elif name == 'mxnet':
        # an example for mxnet model
        from mxnet.gluon.model_zoo.vision import get_model
        block = get_model('resnet18_v1', pretrained=True)
        mod, params = relay.frontend.from_mxnet(block, shape={'data': input_shape}, dtype=dtype)
        net = mod["main"]
        net = relay.Function(net.params, relay.nn.softmax(net.body), None, net.type_params, net.attrs)
        mod = tvm.IRModule.from_expr(net)
elif name.split('.')[-1] == 'onnx':
        model_path = '../data/' + name
        input_shape = (batch_size, 3, input_size, input_size)
        input_shape_dict = {input_name: input_shape}
        mod, params = customed_network_from_onnx(model_path, input_shape_dict)
        output_shape = (batch_size, 2)
...

2. The breakpoint comes at tophub_context = autotvm.util.EmptyContext() for auto-tune mode, while program goes into tophub_context = autotvm.tophub.context(list(target.values())) with auto-tune closed.

In relay.build(mod, target=target, params=params):

    # If current dispatch context is fallback context (the default root context),
    # then load pre-tuned parameters from TopHub
    if isinstance(autotvm.DispatchContext.current, autotvm.FallbackContext):
        tophub_context = autotvm.tophub.context(list(target.values()))
    else:
        tophub_context = autotvm.util.EmptyContext()

According analysis of 1 and 2, I guess the load method of onnx model caused the problem.

Anyone can give me some advice to auto-tune my own onnx model?

fantasyRqg · November 18, 2020, 8:12am

try with opt_level=2, in my case this work. don’t known which optimise pass fault