The bitserial conv2D schedule definition (in tvm/python/tvm/topi/arm_cpu/bitserial_conv2d.py) uses the following LLVM intrinsics for 32-bit ARM architectures:
What would be the corresponding VPADD and VPADALU intrinsics for 64-bit ARM (aarch64) architectures? I tried different llvm.aarch64.neon.suqadd and llvm.aarch64.neon.saddlp intrinsics but keep hitting possible type mismatch error during compilation.
How can the existing bitserial conv2D implementation be modified to run on 64-bit ARM architectures?
@comaniac Can you please help with this or know who might be able to help?