Cast from float64 to float16 cause Segmentation fault

Failed on ubuntu 18.04, with both llvm8 and llvm10. However, it works fine on my MacBook. In addition, cast from float32 to float16 is ok.

To re-produce

import tvm
import topi
from tvm import te
import numpy as np

itype = "float64"
otype = "float16"

x = te.placeholder((2, 2), name='x', dtype=itype)
y = topi.cast(x, otype)
s = te.create_schedule(y.op)
f = tvm.build(s, [x, y], "llvm")
nx = tvm.nd.array(np.random.normal(size=(2, 2)).astype(itype))
ny = tvm.nd.array(np.zeros((2, 2), otype))
f(nx, ny)

output

Segmentation fault (core dumped)

This is crashing because the functions that perform the Float16 conversions are not present. They don’t get resolved at runtime in the generated code and so their addresses remain null.

The problem here is in finding out where to get these functions from. They are present in clang’s compiler-rt, and (afaik) in gcc’s libgcc (possibly with different names), but in TVM we don’t have any indication as to where they are on any particular system.

There is a different thread where I ran into something similar on arm64 so I wonder if they aren’t exactly the same issue.

In my case clang was ICEing when building TVM. The culprit is :

src/relay/transforms/pattern_util.h

399 #if (__ARM_FP16_FORMAT_IEEE == 1)
400     if (array->dtype.bits == 16) {
401       return reinterpret_cast<__fp16*>(array->data)[i];
402     }
403 #endif

Talking to some coworkers in Linaro that work on the llvm toolchain this is due to the fact that llvm is missing support. They were going to add it to their todo list.

Also if I build tvm natively on arm64 with gcc, libtvm.so will error out with : OSError: /home/debian/tvm/build/libtvm.so: undefined symbol: __extendhftf2

Adding -static-libgcc in the top level CMakeLists.txt fixes it.

IE: target_link_libraries(tvm -static-libgcc ${TVM_LINKER_LIBS} ${TVM_RUNTIME_LINKER_LIBS})

1 Like

This is actually a good idea. I was thinking about loading the library into the execution engine, but if the symbols are defined in the current process, we don’t need to do that. We need to make sure they don’t get garbage collected by the linker though.

LLVM does support float16. These functions are implemented in libclang_rt.builtins-<arch>.a, e.g. libclang_rt.builtins-x86_64.a on x86. Usually clang uses libgcc by default, but you can also use compiler-rt with -rtlib=compiler-rt flag.

Here is the patch to llvm to fix the ICE I’d mentioned if building tvm with clang :

https://reviews.llvm.org/D86453

Based on what @kparzysz indicates I suspect this only helps arm64.

Did you solve this problem? I also encountered the same problem.

Not solved yet :slightly_frowning_face: do you test it with some newer version of llvm?

My LLVM version is 10.0.0.

This does not crash when I run it. I’m on an m1 macbook with llvm 11.1.0.

Try upgrading?