[ARM][Bitserial] Bitserial Conv2D on 64-bit ARM Architectures

The bitserial conv2D schedule definition (in tvm/python/tvm/topi/arm_cpu/bitserial_conv2d.py) uses the following LLVM intrinsics for 32-bit ARM architectures:

What would be the corresponding VPADD and VPADALU intrinsics for 64-bit ARM (aarch64) architectures? I tried different llvm.aarch64.neon.suqadd and llvm.aarch64.neon.saddlp intrinsics but keep hitting possible type mismatch error during compilation.

How can the existing bitserial conv2D implementation be modified to run on 64-bit ARM architectures?

@comaniac Can you please help with this or know who might be able to help?

I’m not familiar with bisserial Conv2D. @vinx13 @eqy @cowanmeg could you folks take a look?