Currently, the test test_meta_schedule_post_order_apply_arm_intrin
is failing with, e.g., ValueError: TensorIntrin 'dot_4x4_i8i8s32_sdot' is not registered
.
A similar issue was raised before. Which was fixed with this line.
I was able to fix the current issue by adding the arm_cpu file to the tensor_intrin __init__
file see here. I am not entirely sure what could have broken this and why this works, so I thought I would ask here first before opening a pull request.
My setup: Apple M3, macOS 14.1, Python 3.10, and LLVM 15