I gave up on the idea of using the tensorize scheduling operation because ir_pass.LoopPartition works at a different level (i.e. AST level) and I didnt feel like writing an intrin_func (needed for tensorize) template that would work for all sizes of my tiles (i.e. for the tail regions were the tiles are smaller than normal).
So I added an ir_pass in the last stage which (to my understanding) does something equivalent to what you would expect tensorize to do and at least for my purpose it works.
The way I did it is very similar (if not identical to this VTA construct), which I think is very similar to
Instead of a new api, use pragmas
I let the ir_pass.LoopPartition peel the loops for me
Hi!
I’ve also met the problem with “likely(…)” statements in tensorization.
After enabling loop partitioning with “partition_const_loop” in config, some of them disappeared, but not all. Are there any way to completely remove “likely(…)” statements before tensorization, without writing your own IR pass?