I want to know if FP16 intrin was supported. No any VADD.I16 q0, q1, q2 neon instructions was found in .s file. And the target value I set is:
target = "llvm --device=arm_cpu -mattr=+neon --mtriple=aarch64-arm-none-eabi -mcpu=armv8.2-a+fp16"
The LLVM version I used is 13.0.1, default target is arm64-apple-darwin21.2.0.
Any help was welcomed.
1 Like
the model can be converted to fp16 automatically whether using te.compute
or relay.
Then just using float16 data to inference.
On my M1, following scripts work as expected
import numpy as np
import tvm
from tvm import relay, te
idtype = "float16"
x = te.placeholder(name="x", dtype=idtype, shape=[15, 10])
w = te.placeholder(name="w", dtype=idtype, shape=[20, 10])
k = te.reduce_axis((0, 10), name="k")
out = te.compute(
(15, 20),
lambda i,j: te.sum(x[i, k] * w[j, k], axis=k),
)
sch = te.create_schedule(out.op)
tlinear = tvm.build(sch, [x, w, out])
dev = tvm.cpu()
a = tvm.nd.array(np.random.uniform(size=(15, 10)).astype(idtype), dev)
b = tvm.nd.array(np.random.uniform(size=(20, 10)).astype(idtype), dev)
c = tvm.nd.array(np.random.uniform(size=(15, 20)).astype(idtype), dev)
tlinear(a, b, c)
1 Like
I just wanted to note that you should be able to see sizable performance improvements from running FP16 on Apple M1. Check out the relay.transform.ToMixedPrecision
pass to easily convert your model to FP16. I’ve also personally found that adding -mattr=+fullfp16
to the target string makes a big difference.
1 Like
Hi. I got this when apply fullfp16 on a decent interl CPU (should support fp16)
'+fullfp16' is not a recognized feature for this target (ignoring feature)
Unfortunately, Intel doesn’t have very good FP16 support as far as I know.
@jwfromm does M1 mac have fp16 support?
Yes, all the ARM based Macs should get great FP16 performance.