ARM FP16 instrin support in M1 chip

I want to know if FP16 intrin was supported. No any VADD.I16 q0, q1, q2 neon instructions was found in .s file. And the target value I set is:

target = "llvm --device=arm_cpu -mattr=+neon --mtriple=aarch64-arm-none-eabi -mcpu=armv8.2-a+fp16"

The LLVM version I used is 13.0.1, default target is arm64-apple-darwin21.2.0.

Any help was welcomed.

1 Like

the model can be converted to fp16 automatically whether using te.compute or relay.

Then just using float16 data to inference.

On my M1, following scripts work as expected

import numpy as np

import tvm
from tvm import relay, te

idtype = "float16"

x = te.placeholder(name="x", dtype=idtype, shape=[15, 10])
w = te.placeholder(name="w", dtype=idtype, shape=[20, 10])

k = te.reduce_axis((0, 10), name="k")
out = te.compute(
    (15, 20), 
    lambda i,j: te.sum(x[i, k] * w[j, k], axis=k),    
)

sch = te.create_schedule(out.op)
tlinear = tvm.build(sch, [x, w, out])

dev = tvm.cpu()
a = tvm.nd.array(np.random.uniform(size=(15, 10)).astype(idtype), dev)
b = tvm.nd.array(np.random.uniform(size=(20, 10)).astype(idtype), dev)
c = tvm.nd.array(np.random.uniform(size=(15, 20)).astype(idtype), dev)

tlinear(a, b, c)
1 Like

I just wanted to note that you should be able to see sizable performance improvements from running FP16 on Apple M1. Check out the relay.transform.ToMixedPrecision pass to easily convert your model to FP16. I’ve also personally found that adding -mattr=+fullfp16 to the target string makes a big difference.

1 Like

Hi. I got this when apply fullfp16 on a decent interl CPU (should support fp16)

'+fullfp16' is not a recognized feature for this target (ignoring feature)

Unfortunately, Intel doesn’t have very good FP16 support as far as I know.

@jwfromm does M1 mac have fp16 support?

Yes, all the ARM based Macs should get great FP16 performance.