[SOLVED]Compiling resnext wsl error

Arthur · April 6, 2020, 5:01am

Hey guys,

I’m still new to TVM development, while compiling the resnext wsl network following error occurred, please help to debug whether it’s a bug of TVM or my usage issue, really appreciate you guys’ help!

1. Error details below:

  File "/home/arthur/Documents/tvm/python/tvm/relay/frontend/pytorch.py", line 1172, in from_pytorch
    output_index_map, ret_name)

  File "/home/arthur/Documents/tvm/python/tvm/relay/frontend/pytorch.py", line 1099, in parse_operators
    outputs.append(relay_op(inputs, _get_input_types(op_node)))

  File "/home/arthur/Documents/tvm/python/tvm/relay/frontend/pytorch.py", line 307, in _impl
    channels = _infer_shape(data)

  File "/home/arthur/Documents/tvm/python/tvm/relay/frontend/common.py", line 466, in infer_shape
    out_type = infer_type(inputs, mod=mod)

  File "/home/arthur/Documents/tvm/python/tvm/relay/frontend/common.py", line 457, in infer_type
    new_mod = IRModule.from_expr(node)

  File "/home/arthur/Documents/tvm/python/tvm/ir/module.py", line 223, in from_expr
    return _ffi_api.Module_FromExpr(expr, funcs, defs)

  File "tvm/_ffi/_cython/./packed_func.pxi", line 308, in tvm._ffi._cy3.core.PackedFuncBase.__call__

  File "tvm/_ffi/_cython/./packed_func.pxi", line 243, in tvm._ffi._cy3.core.FuncCall

  File "tvm/_ffi/_cython/./packed_func.pxi", line 232, in tvm._ffi._cy3.core.FuncCall3

  File "tvm/_ffi/_cython/./base.pxi", line 159, in tvm._ffi._cy3.core.CALL

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/arthur/Documents/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7fdee6091611]
  [bt] (7) /home/arthur/Documents/tvm/build/libtvm.so(+0xa0fa5c) [0x7fdee5969a5c]
  [bt] (6) /home/arthur/Documents/tvm/build/libtvm.so(tvm::IRModule::FromExpr(tvm::RelayExpr const&, tvm::Map<tvm::GlobalVar, tvm::BaseFunc, void, void> const&, tvm::Map<tvm::GlobalTypeVar, tvm::TypeData, void, void> const&)+0x189) [0x7fdee5967049]
  [bt] (5) /home/arthur/Documents/tvm/build/libtvm.so(tvm::IRModuleNode::Add(tvm::GlobalVar const&, tvm::BaseFunc const&, bool)+0xdc) [0x7fdee59669cc]
  [bt] (4) /home/arthur/Documents/tvm/build/libtvm.so(tvm::RunTypeCheck(tvm::IRModule const&, tvm::GlobalVar const&, tvm::relay::Function)+0x277) [0x7fdee5965c57]
  [bt] (3) /home/arthur/Documents/tvm/build/libtvm.so(tvm::relay::InferType(tvm::relay::Function const&, tvm::IRModule const&, tvm::GlobalVar const&)+0x1cf) [0x7fdee5f0473f]
  [bt] (2) /home/arthur/Documents/tvm/build/libtvm.so(tvm::relay::TypeInferencer::Infer(tvm::RelayExpr)+0x86) [0x7fdee5f03f66]
  [bt] (1) /home/arthur/Documents/tvm/build/libtvm.so(tvm::ErrorReporter::RenderErrors(tvm::IRModule const&, bool)+0x2816) [0x7fdee59596b6]
  [bt] (0) /home/arthur/Documents/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7fdee585d022]
  File "/home/arthur/Documents/tvm/src/ir/error.cc", line 133
TVMError:
Error(s) have occurred. The program has been annotated with them:

In `main`:
v0.0.4
fn (%input.1: Tensor[(1, 3, 224, 224), float32], %weight.1: Tensor[(64, 3, 7, 7), float32], %weight.2: Tensor[(64), float32], %bias.1: Tensor[(64), float32], %running_mean.1: Tensor[(64), float32], %running_var.1: Tensor[(64), float32], %weight.3: Tensor[(256, 64, 1, 1), float32], %weight.4: Tensor[(256), float32], %bias.2: Tensor[(256), float32], %running_mean.2: Tensor[(256), float32], %running_var.2: Tensor[(256), float32], %weight.5: Tensor[(256, 8, 3, 3), float32]) {
  %0 = nn.conv2d(%input.1, %weight.1, strides=[2, 2], padding=[3, 3, 3, 3], channels=64, kernel_size=[7, 7]);
  %1 = nn.batch_norm(%0, %weight.2, %bias.1, %running_mean.1, %running_var.1);
  %2 = %1.0;
  %3 = nn.relu(%2);
  %4 = nn.max_pool2d(%3, pool_size=[3, 3], strides=[2, 2], padding=[1, 1]);
  %5 = nn.conv2d(%4, %weight.3, padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1]);
  %6 = nn.batch_norm(%5, %weight.4, %bias.2, %running_mean.2, %running_var.2);
  %7 = %6.0;
  %8 = nn.relu(%7);
  %9 = reshape(%weight.5, newshape=[32, 8, 3, 3]);
  nn.conv2d(%8, %9, padding=[1, 1, 1, 1], groups=32, channels=256, kernel_size=[3, 3]) in particular dimension 0 conflicts 256 does not match 32; unable to unify: `Tensor[(256, 8, 3, 3), float32]` and `Tensor[(32, 8, 3, 3), float32]`;
}

2. Source code details below:

model = torch.hub.load("facebookresearch/WSL-Images", "resnext101_32x8d_wsl")
model.eval()

input_shape = (1, 3, 224, 224)
input_data = torch.randn(input_shape)
scripted_model = torch.jit.trace(model, input_data).eval()

input_name = get_graph_input_names(scripted_model)[0]  # only one input
shape_dict = {input_name: (1, 3, 224, 224)}
mod, params = relay.frontend.from_pytorch(scripted_model,
                                          shape_dict)

arget = tvm.target.cuda("-model=tx2")
target_host = 'llvm -target=aarch64-linux-gnu'
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(mod,
                                     target=target,
                                     target_host=target_host,
                                     params=params)

masahi · April 5, 2020, 10:32pm

Thanks for reporting. The error is likely due to the group convolution conversion bug which was fixed in https://github.com/apache/incubator-tvm/pull/5132

If you upgrade your TVM install to the latest one, it should work. Note that we changed API slightly recently. I confirmed that following works withe latest master.

import torch
from tvm import relay

model = torch.hub.load("facebookresearch/WSL-Images", "resnext101_32x8d_wsl")
model.eval()

input_shape = (1, 3, 224, 224)
input_data = torch.randn(input_shape)
scripted_model = torch.jit.trace(model, input_data).eval()

shape_dict = [("input", (1, 3, 224, 224))]
mod, params = relay.frontend.from_pytorch(scripted_model, shape_dict)

Arthur · April 6, 2020, 4:06am

Cool, thanks a lot for your quick response, I’ll give it a shot right away, once finished update here for future reference.

Arthur · April 6, 2020, 5:01am

Just confirmed it’s fixed, really appreciate your help again!