I’m trying to use llvm intrinsics on a raspberry pi 4, but I get the error LLVM ERROR: Cannot select: intrinsic %llvm.arm.neon.vpadals
when trying to build. Here is a test file:
import tvm
import tvm.auto_scheduler
from tvm.tir.ir_builder import create
import numpy as np
def intrin():
def f(out):
ib = create()
r = tvm.tir.call_llvm_pure_intrin("int32x4", "llvm.arm.neon.vpadals.v4i32.v8i16", tvm.tir.const(2, "uint32"), tvm.tir.const(1, "int32x4"), tvm.tir.const(0, "int16x8"))
ib.emit(out.vstore(0, r))
return ib.get()
out = tvm.tir.decl_buffer((4,), name="out", dtype="int32")
return tvm.te.extern([out.shape], [], lambda ins, outs: f(outs[0]), out_buffers=[out], name="intrin", dtype="int32")
if __name__ == "__main__":
target = tvm.target.Target("llvm -device=arm_cpu -model=bcm2711 -mtriple=aarch64-linux-gnu -mattr=+neon -mcpu=cortex-a72")
out = intrin()
s = tvm.te.create_schedule(out.op)
f = tvm.build(s, [out], target=target)
f(tvm.nd.array(np.zeros((4,), dtype="int16")))
I believe the vpadals
instruction is available because I can compile and run a file using it from c:
#include <arm_neon.h>
int main() {
int16x4_t a = {0, 1, 2, 3};
int8x8_t b = {4, 5, 6, 7};
int16x4_t r = vpadal_s8(a, b);
printf("%d\n", r);
}
Any ideas why this error occurs?
@kparzysz it seems like you have experience with ARM and LLVM, maybe you can provide some insight?