Fp16 segmentation fault on C++, but works on python

Hi. I have a tvm module, which using some kernels and consist of 2 FC layer and activation only.

I found the func by name in python, and test with exported fp16 module, with bumpy fp16 array, it can forward and printout values.

But in C++, I can not get the values, the function can not pass, just segmentation fault.

I can make sure, python values like this:

a = tvm.nd.array(np.random.rand(batch, K1).astype(np.float32), dev)
w0 = tvm.nd.array(
        np.random.uniform(size=[K1 // kc1, hdims[0] // stride, kc1, W, stride]).astype(
            np.float32
            # dtype
        ),
        dev,
    )
coff = tvm.nd.array(np.random.uniform(size=[batch, 4]).astype(dtype), dev)
    # coff = tvm.nd.array(np.random.uniform(size=[batch, 4]).astype(np.float32), dev)

    o1 = tvm.nd.array(np.zeros([batch, hdims[0]]).astype(np.float32), dev)
fc1(a, w0, coff, o1)

it can forward, if I only set coff to fp16, keep others as float32.

I did exactly same in C++, but segmentation fault. Anyone could give a hand to test where might caused the problem?