[Frontend][Pytorch] Compile yolact for xilinx hardware zcu104

Trying to compile this instance segmentation model: https://github.com/dbolya/yolact

I could able to eliminate CUDA dependcy (couple of lines commented in yoloact.py) and JIT trace the model. But frontend complains: NotImplementedError: The following operators are not implemented: ['aten::_set_item', 'aten::conv2d', 'aten::append']

These ops come from scripting due to https://github.com/dbolya/yolact/blob/57b8f2d95e62e2e649b382f516ab41f949b57239/yolact.py#L29-L30.

You need to rewrite this model so that it works with normal nn.Module. Otherwise itl is not supported.

Thanks @masahi for pointing out. Could able to pass through this issue.

But now it prints below warning, which is not to be worried about as you mentioned in some other discussions: WARNING:root:Untyped Tensor found, assume it is float32 but throws Exception: warning unhandled case: <class 'float'> .

With yolov5, seems you have fixed such an issue but with ‘NoneType’ recently it seems, but not sure if that fixes this issue as well. Will pull recent TVM and let you know if it persits

@masahi Updating to recent TVM didnt fix above mentioned issue. Still facing the same exception:

Exception: warning unhandled case: <class 'float'>

Hi @masahi ,

https://github.com/Ma-Dan/yolact/tree/onnx - Ma-Dan version of yolact is onnx exportable (hope its good to JIT trace), tried this with pytorch frontend but getting segmentation fault at partition_for_vitis_ai(mod, params, dpu=target)as below (debug logs enabled - can post full log in file if needed)

DEBUG:pyxir:-- -- Sweep transpose: bX: ['moved_moved_nn_conv2d_NHWC-NCHW-94241211375472'], X: nn.relu-94240959460784, tX: ['nn_conv2d-94241211375472']
DEBUG:pyxir:-- Visit: moved_moved_moved_moved_transpose-94241201609008
-- -- for opt: SweepTransposesFlowDirection
DEBUG:pyxir:-- Visit: nn_bias_add-94241211132064
-- -- for opt: SweepTransposesFlowDirection
DEBUG:pyxir:-- -- Sweep transpose: bX: ['moved_moved_moved_moved_transpose-94241201609008'], X: nn_bias_add-94241211132064, tX: ['nn.relu-94241201608720']
INFO:pyxir:Writing graph visualization to after_partitioning_sweep_transposes.png
Fatal Python error: Segmentation fault

Current thread 0x00007f8944951740 (most recent call first):
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tvm-0.8.dev1859+g627e92e7c-py3.6-linux-x86_64.egg/tvm/ir/transform.py", line 161 in __call__
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tvm-0.8.dev1859+g627e92e7c-py3.6-linux-x86_64.egg/tvm/relay/op/contrib/vitis_ai.py", line 191 in partition_for_vitis_ai
  File "compile_yoloact_vitisai.py", line 226 in <module>
Segmentation fault (core dumped)

Exported and tried with onnx frontend (with and without providing shape_dict) but resulting in below error:

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/pyxir-0.3.2-py3.6-linux-x86_64.egg/pyxir/graph/layer/xlayer_factory.py", line 43, in factory_func
    d = register_func(attrs, in_xlayers)
      File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/pyxir-0.3.2-py3.6-linux-x86_64.egg/pyxir/graph/ops/l1_basic_nn.py", line 630, in sub
    shape = TensorShape(get_numpy_broadcasted_shape(lX.shapes[:], rX.shapes[:]))
      File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/pyxir-0.3.2-py3.6-linux-x86_64.egg/pyxir/shapes/tools.py", line 39, in get_numpy_broadcasted_shape
    " {} and {}".format(shape_a, shape_b))
    ValueError: Invalid shapes for broadcasted additions: [-1, 18225, 81] and [1, 19248, 1]

Any suggestions to proceed further?

The segfault could be due to the issue in https://github.com/apache/tvm/issues/9362

Try the solution in https://github.com/apache/tvm/issues/9362#issuecomment-955263494

Thanks for the reply @masahi. Tried recompiling tvm with set(USE_LLVM "/usr/bin/llvm-config-9 --link-static") and set(HIDE_PRIVATE_SYMBOLS ON) flags enabled. But didn’t help to get rid off compilation issue.

FYI: I have faced seg fault earlier (raised very earlier part of compilation) that was fixed just by swapping import order of tvm, torch. But this seems to be at a particular transform pass (repeatedly with same backtrace).

@masahi In mod[‘main’], I can notice that there are some “warning: no trace info” tags in mod[‘main’] (highlighted few below). Is this may be the reason for segmentation fault during partitioning?

  %349 = add(%348, meta[relay.Constant][316] /* ty=Tensor[(1, 1, 1, 32), float32] */) /* ty=Tensor[(1, 138, 138, 32), float32] */;
  %350 = nn.relu(%349) /* C.graph: aten::relu_, warning: no trace info 0 */ /* ty=Tensor[(1, 138, 138, 32), float32] */;
  %351 = concatenate(%285, axis=-2) /* C.graph: aten::cat, warning: no trace info 3 */ /* ty=Tensor[(1, 19248, 4), float32] */;
  %352 = nn.softmax(%307) /* C.graph: aten::softmax, warning: no trace info 7 */ /* ty=Tensor[(1, 19248, 81), float32] */;
  %353 = concatenate(%333, axis=-2) /* C.graph: aten::cat, warning: no trace info 5 */ /* ty=Tensor[(1, 19248, 32), float32] */;
  %354 = transpose(%350, axes=[0, 1, 2, 3]) /* C.graph: aten::permute, warning: no trace info 1 */ /* ty=Tensor[(1, 138, 138, 32), float32] */;
  (%351, %352, %353, meta[relay.Constant][306] /* ty=Tensor[(19248, 4), float32] */, %354)
}

Note: Segmentation fault happening at transform.PartitionGraph()