Error: Cannot select intrinsic

tkonolige · January 31, 2022, 11:45pm

I’m trying to use llvm intrinsics on a raspberry pi 4, but I get the error LLVM ERROR: Cannot select: intrinsic %llvm.arm.neon.vpadals when trying to build. Here is a test file:

import tvm
import tvm.auto_scheduler
from tvm.tir.ir_builder import create
import numpy as np

def intrin():
    def f(out):
        ib = create()
        r = tvm.tir.call_llvm_pure_intrin("int32x4", "llvm.arm.neon.vpadals.v4i32.v8i16", tvm.tir.const(2, "uint32"), tvm.tir.const(1, "int32x4"), tvm.tir.const(0, "int16x8"))
        ib.emit(out.vstore(0, r))
        return ib.get()
    out = tvm.tir.decl_buffer((4,), name="out", dtype="int32")
    return tvm.te.extern([out.shape], [], lambda ins, outs: f(outs[0]), out_buffers=[out], name="intrin", dtype="int32")

if __name__ == "__main__":
    target = tvm.target.Target("llvm -device=arm_cpu -model=bcm2711 -mtriple=aarch64-linux-gnu -mattr=+neon -mcpu=cortex-a72")
    out = intrin()
    s = tvm.te.create_schedule(out.op)
    f = tvm.build(s, [out], target=target)
    f(tvm.nd.array(np.zeros((4,), dtype="int16")))

I believe the vpadals instruction is available because I can compile and run a file using it from c:

#include <arm_neon.h>
int main() {
	int16x4_t a = {0, 1, 2, 3};
	int8x8_t b = {4, 5, 6, 7};
	int16x4_t r = vpadal_s8(a, b);
	printf("%d\n", r);
}

Any ideas why this error occurs?

@kparzysz it seems like you have experience with ARM and LLVM, maybe you can provide some insight?

ramana-arm · February 2, 2022, 10:05pm

If this is using AArch64 really on the Raspberry Pi4 (i.e. uname shows aarch64-linux on the command line)

See if using llvm.aarch64.neon.saddlp.v4i16.v8i8 helps ? Not an LLVM expert but I think the intrinsic you are using is suitable on AArch32 i.e. they are different backends for the AArch32 ISA and the AArch64 ISA.

Further the return type I think for vpadal_s8 would be int16x4_t so mapping it to int32x4 seems a bit off on first reading.

Ramana

tkonolige · February 3, 2022, 7:16pm

saddlp works, thanks Ramana! Seems like saddlp is a slightly different instruction from vpadals. I wonder why vpadals is not available even though it is listed as available for A64 in arm’s documentation.

kparzysz · February 5, 2022, 12:28am

vpadals is not an AArch64 intrinsic (at least not in LLVM). For AArch64 LLVM has <4 x i32> @llvm.aarch64.neon.saddlp.v4i32.v8i16(<8 x i16>).

OoJJBoO · November 22, 2022, 2:17pm

Sorry for bumping this topic, but I ran into the same issue described in the original post and seem to not quite understand how to implement what @ramana-arm and @kparzysz suggested earlier.

Like the previous answers suggested, and since @tkonolige also reported that this did work for them, I tried to test the code from the original post with the mentioned llvm.aarch64.neon.saddlp.v4i32.v8i16 intrinsic instead of the one used in the original code posted above:

import tvm
from tvm.tir.ir_builder import create
import numpy as np

def intrin():
    def f(out):
        ib = create()
        r = tvm.tir.call_llvm_pure_intrin("int32x4", "llvm.aarch64.neon.saddlp.v4i32.v8i16", tvm.tir.const(2, "uint32"), tvm.tir.const(1, "int32x4"), tvm.tir.const(0, "int16x8"))
        ib.emit(out.vstore(0, r))
        return ib.get()
    out = tvm.tir.decl_buffer((4,), name="out", dtype="int32")
    return tvm.te.extern([out.shape], [], lambda ins, outs: f(outs[0]), out_buffers=[out], name="intrin", dtype="int32")

if __name__ == "__main__":
    target = tvm.target.arm_cpu("rasp4b64")
    out = intrin()
    s = tvm.te.create_schedule(out.op)
    f = tvm.build(s, [out], target=target)
    f(tvm.nd.array(np.zeros((4,), dtype="int16")))

But when trying to run this script, I am now running into the following error while executing the tvm.build call:

Check failed: (f) is false: Cannot find intrinsic declaration, possible type mismatch: llvm.aarch64.neon.saddlp

I am still pretty new to TVM and especially using intrinsics, so it would be great if someone could maybe help me out here and shortly explain what I am doing wrong!

Hiba · July 28, 2023, 2:54pm

I got a similar effor, while compiling a TFLite model with TVM: LLVM ERROR: Cannot select: intrinsic %llvm.aarch64.neon.uaddlp Does anyone have an idea about what it can be?

OoJJBoO · August 1, 2023, 7:49am

Did you manage to resolve the issue @Hiba? Sounds like you are trying to compile/run some operator that uses 64-bit ARM (aka AArch64) instructions on either a 32-bit or non-ARM system.

Hiba · August 1, 2023, 8:05am

That’s exactly the problem @OoJJBoO . Here is a detailed description of the issue, I am still stuck in it.