Hey.
No sorry that was not what I meant.
I gave up on the idea of using the tensorize
scheduling operation because ir_pass.LoopPartition
works at a different level (i.e. AST level) and I didnt feel like writing an intrin_func
(needed for tensorize
) template that would work for all sizes of my tiles (i.e. for the tail regions were the tiles are smaller than normal).
So I added an ir_pass
in the last stage which (to my understanding) does something equivalent to what you would expect tensorize
to do and at least for my purpose it works.
The way I did it is very similar (if not identical to this VTA construct), which I think is very similar to
- Instead of a new api, use pragmas
- I let the
ir_pass.LoopPartition
peel the loops for me - Same as above
- Same as above