OSError: ..... /tvm/build/libtvm.so: undefined symbol: __extendhftf2

Have a fresh pull from master as of late Friday afternoon and this missing symbol pops up at runtime.

This is on arm64. llvm-10, Wasn’t seeing this from my master build from a few days ago. Same make config.

Anyone else seen this?

does it compile without llvm?

Compiling isn’t the issue. This shows up at runtime when libtvm.so is loaded.

Given my use case is on CPU, turning LLVM off isn’t viable in the long run. Still to try and narrow it down I’ve been looking at two paths this morning.

  1. build all of TVM with clang. Interestingly this ICEs clang. (10.0.1) (There is also on compile time fix required)

  2. turn off LLVM - No change. libtvm.so fails to load

Traceback (most recent call last): File “./mobilenet-v1.0.5-arm_cpu-arm64-quant.py”, line 3, in import tvm File “/home/tgall/tvm/python/tvm/init.py”, line 25, in from ._ffi.base import TVMError, version File “/home/tgall/tvm/python/tvm/_ffi/init.py”, line 28, in from .base import register_error File “/home/tgall/tvm/python/tvm/_ffi/base.py”, line 62, in _LIB, _LIB_NAME = _load_lib() File “/home/tgall/tvm/python/tvm/_ffi/base.py”, line 50, in _load_lib lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL) File “/usr/lib/python3.7/ctypes/init.py”, line 356, in init self._handle = _dlopen(self._name, mode) OSError: /home/tgall/tvm/build/libtvm.so: undefined symbol: __extendhftf2

gcc in this case is pretty boring

gcc --version gcc (Debian 8.3.0-6) 8.3.0 Copyright © 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It is interesting…I did a google search and saw extendhftf2 is part of glibc…I am not sure what is happening

Could you print the result of ldd libtvm.so?

linux-vdso.so.1 (0x0000ffffae536000)
libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1 (0x0000ffffa9a9f000)
librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000ffffa9a87000)
libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000ffffa9a73000)
libtinfo.so.6 => /lib/aarch64-linux-gnu/libtinfo.so.6 (0x0000ffffa9a35000)
libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffffa9a06000)
libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffffa987b000)
libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffa97be000)
libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffffa979a000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffa9628000)
/lib/ld-linux-aarch64.so.1 (0x0000ffffae508000)

IIRC extendhftf2 is used for conversion between half-precision and quad-precision, but quad-precision is never used in TVM…Therefore I suspect the issue comes from those dependencies.

I do not have a clue on this issue with arm gcc (@comaniac @haichen have more experience on arm than me) To rule out latent factors, if you are interested, could we try the following options with LLVM off (just to rule out the possibility that llvm causes the issue)

  • Use -static-libgcc flag to link libgcc_s.so statically; or
  • Bisect the TVM commits to see what commit causes this issue.

Of interest, this doesn’t appear with arm32, it’s just on aarch64. I half wonder if there is a bug in debian gcc 8.2. FYI @ramana-arm

I’ll see about picking up something later or building gcc 10.2 from source or even just linking it in to libtvm.so.

Hmm, that indicates to me this is something to do with FP16 support. A call to __extendhftf2 indicates that for some reason we have an extension from FP16 to long double. long double and double are the same on AArch32 and thus would be fine but there’s something else that needs to be looked at here.

Did something change recently on master so that we are producing some __Float16(FP16) to __Float128(TF/long double ?) conversions ?

regards Ramana

Linking libgcc.a to libtvm.so fixes it for aarch64, think that’s a reasonable fix @ramana-arm ?

I am glad that using -static-libgcc flag fixed this issue. It indicates something wrong with the toolchain. My guess is that there might be conflict libgcc versions between python and libtvm.so.

That should just be part of the normal link process - sounds more like a gcc bug to me and we need to extract a small testcase to pass back via debian back to upstream.

Ramana

1 Like

FWIW - I’ve reproduced this with both gcc 10.2 and 8.3 (debian packaged with buster).