As the title says. Just like the GEMM on CPU tutorial
bn value is fixed as 32 here. If I want to auto-tune it, it would be great if this value can be tunable to exploit any vectorization possibilities, as instruction sets like SSE, AVX2, and AVX512 have different register sizes.
However, since I’m gonna decouple compute and schedule, i.e. the AutoTVM config will be set up after instead of before the compute has been fixed, I wonder if there’s any way to, say, keep
bn as a placeholder and auto-tune it in the schedule? Any suggestions?
Thanks in advance!