ARM FP16 instrin support in M1 chip

huangzhiyuan · March 22, 2022, 12:15pm

I want to know if FP16 intrin was supported. No any VADD.I16 q0, q1, q2 neon instructions was found in .s file. And the target value I set is:

target = "llvm --device=arm_cpu -mattr=+neon --mtriple=aarch64-arm-none-eabi -mcpu=armv8.2-a+fp16"

The LLVM version I used is 13.0.1, default target is arm64-apple-darwin21.2.0.

Any help was welcomed.

jinfagang · March 24, 2022, 2:29am

the model can be converted to fp16 automatically whether using te.compute or relay.

Then just using float16 data to inference.

Lyken17 · March 24, 2022, 5:29pm

On my M1, following scripts work as expected

import numpy as np

import tvm
from tvm import relay, te

idtype = "float16"

x = te.placeholder(name="x", dtype=idtype, shape=[15, 10])
w = te.placeholder(name="w", dtype=idtype, shape=[20, 10])

k = te.reduce_axis((0, 10), name="k")
out = te.compute(
    (15, 20), 
    lambda i,j: te.sum(x[i, k] * w[j, k], axis=k),    
)

sch = te.create_schedule(out.op)
tlinear = tvm.build(sch, [x, w, out])

dev = tvm.cpu()
a = tvm.nd.array(np.random.uniform(size=(15, 10)).astype(idtype), dev)
b = tvm.nd.array(np.random.uniform(size=(20, 10)).astype(idtype), dev)
c = tvm.nd.array(np.random.uniform(size=(15, 20)).astype(idtype), dev)

tlinear(a, b, c)

jwfromm · March 25, 2022, 10:12pm

I just wanted to note that you should be able to see sizable performance improvements from running FP16 on Apple M1. Check out the relay.transform.ToMixedPrecision pass to easily convert your model to FP16. I’ve also personally found that adding -mattr=+fullfp16 to the target string makes a big difference.

jinfagang · May 5, 2022, 2:19am

Hi. I got this when apply fullfp16 on a decent interl CPU (should support fp16)

'+fullfp16' is not a recognized feature for this target (ignoring feature)

jwfromm · May 9, 2022, 9:10pm

Unfortunately, Intel doesn’t have very good FP16 support as far as I know.

jinfagang · May 17, 2022, 3:49am

@jwfromm does M1 mac have fp16 support?

jwfromm · May 17, 2022, 6:28pm

Yes, all the ARM based Macs should get great FP16 performance.