Bus error (core dumped) on A64FX CPU

Hi,

I’m trying out from_tensorflow example on an A64FX CPU. I’m using Fujitsu Tensorflow.

On trying out the above mentioned example, while importing tensorflow graph to the relay frontend,

mod, params = relay.frontend.from_tensorflow(graph_def, layout=layout, shape=shape_dict)

It is giving me Bus Error.

Backtracing using ptrace/strace/pdb is not of much help. Debugger output:

--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0x3b} ---

+++ killed by SIGBUS (core dumped) +++

Can someone help me here, please?

I think it’s better to import and compile your model on a x86 host, and run it on A64FX via RPC. I don’t know why such errors arise.

I’m interested to hear what the TVM’s performance would be like on A64FX. Note that there is a proposal by ARM to bring SVE support to TVM https://github.com/apache/tvm-rfcs/pull/18

Sure, I will try this approach. For the same reason, I wanted to try this out to analyze the performance. Thanks for the link to RFC.

Hi @xintin ! I’ve been running model on an FX700 as well. Have you tried the Tensorflow from Google? (Well Linaro actually, “we” build and release it) - any chance I could get a copy of the model to try and reproduce ? (the link you have above is kinda broken - is that one of the example? it looks like it)

@xintin Hi. I’ve learned from some Arm folks that everything important in that Fujitsu Tensorflow fork is probably already upstream (integrated in the Tensorflow from Google). The debug info you provided unfortunately doesn’t tell much about the sigbus, but it seems it’s taken when trying to import the Tensorflow model, so I think it’s not related to TVM, rather with the Tensorflow you used.

As @tgall_foo said, it happens that Linaro is building and hosting Tensorflow from Google for aarch64 here.

Thus I’ve just tried to run the from_tensorflow example on FX700 using the Tensorflow from Google built by Linaro and I was able to complete the run – no sigbus. I’ll paste below the details about it.

Since I’m using Python 3.6.8 I installed tensorflow with:

$ python3 -m pip install --extra-index-url https://snapshots.linaro.org/ldcg/python-cache/ http://releases.linaro.org/components/ldcg/tensorflow-aarch64/r2.7.0-rc0/tensorflow-aarch64/tensorflow_aarch64-2.7.0rc0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Some additional info on my FX700 environment:

(venv) [gromero@fx700-vm-3 compile_models]$ python3 --version 
Python 3.6.8

(venv) [gromero@fx700-vm-3 compile_models]$ uname -a 
Linux fx700-vm-3.novalocal 4.18.0-301.1.el8.aarch64 #1 SMP Tue Apr 13 15:40:54 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

(venv) [gromero@fx700-vm-3 compile_models]$ cat /etc/redhat-release 
CentOS Stream release 8

Full run log:

(venv) [gromero@fx700-vm-3 compile_models]$ python3 ./from_tensorflow.py
2021-10-14 19:12:33.664436: W tensorflow/core/framework/op_def_util.cc:371] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
WARNING:tensorflow:From /home/gromero/git/tvm/python/tvm/relay/testing/tf.py:139: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /home/gromero/git/tvm/gallery/how_to/compile_models/venv/lib64/python3.6/site-packages/tensorflow/python/framework/convert_to_constants.py:929: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
/home/gromero/git/tvm/python/tvm/relay/frontend/tensorflow.py:535: UserWarning: Ignore the passed shape. Shape in graphdef will be used for operator DecodeJpeg/contents.
  "will be used for operator %s." % node.name
/home/gromero/git/tvm/python/tvm/relay/frontend/tensorflow_ops.py:1006: UserWarning: DecodeJpeg: It's a pass through, please handle preprocessing before input
  warnings.warn("DecodeJpeg: It's a pass through, please handle preprocessing before input")
Tensorflow protobuf imported to relay frontend.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
African elephant, Loxodonta africana (score = 0.58335)
tusker (score = 0.33901)
Indian elephant, Elephas maximus (score = 0.02391)
banana (score = 0.00025)
vault (score = 0.00021)
===== TENSORFLOW RESULTS =======
African elephant, Loxodonta africana (score = 0.58394)
tusker (score = 0.33909)
Indian elephant, Elephas maximus (score = 0.03186)
banana (score = 0.00022)
desk (score = 0.00019)
(venv) [gromero@fx700-vm-3 compile_models]$

TVM head:

(venv) [gromero@fx700-vm-3 tvm]$ git describe
v0.4-5337-g8a3fcc40a
(venv) [gromero@fx700-vm-3 tvm]$ git log --oneline -1 
8a3fcc40a (HEAD) [TVMC] Compose target options from target registry (#9218)

HTH.

1 Like

@tgall_foo, as @gromero suggested, from_tensorflow is the example I am trying. Sorry for the broken link.

Also, sadly, I could not get much information myself from the gdb or ptrace causing sigbus. I doubt the issue is with the build process provided in the Github wiki.

@gromero, I could successfully reproduce the results using Linaro provided builds. Thanks for that. I had one question, though. Can I assume that these wheels are built with the optimizations flags for the A64FX?

@xintin Hi. Thanks for the test.

Nope, they are not built with optimizations flags for A64FX. On the other hand, since tensorflow is mostly used in the example for importing the model and then only for comparison against TVM I think it can’t affect TVM performance, getting the operators tuned on TVM (i.e. getting rid of conv2d NHWC layout is not optimized for x86 with autotvm. messages) would be more interesting for the TVM performance on A64FX. Of course, one can still be interested in building an optimized tensorflow for A64FX for comparison purposes and that 's fair. But to me the experiment shows that there is no sigbus in TVM when running on A64FX.

@gromero, I get it now. It will be interesting to look into the items mentioned above. Thank you.