[BugFix][MetaSchedule] Fix TensorIntrin 'dot_4x4_i8i8s32_sdot' is not registered

Currently, the test test_meta_schedule_post_order_apply_arm_intrin is failing with, e.g., ValueError: TensorIntrin 'dot_4x4_i8i8s32_sdot' is not registered.

A similar issue was raised before. Which was fixed with this line.

I was able to fix the current issue by adding the arm_cpu file to the tensor_intrin __init__ file see here. I am not entirely sure what could have broken this and why this works, so I thought I would ask here first before opening a pull request.

My setup: Apple M3, macOS 14.1, Python 3.10, and LLVM 15

Thanks for pointing this out @felix_ro - the line was mistakenly removed here. It’s a little surprising this didn’t get caught by CI, however, I’d be happy to review a PR with the change.

1 Like

Thank you; I restored the lost line and opened a PR.