Regarding the imperfect tiling : I think cause of that problem is tensorization
is happening as part of ScheduleOps
and before the LoopPartition
pass.
There has been good discussion about this problem and solutions suggested were
- Auto-Tensorization 2) Having a separte pass that happens much later after the
ScheduleOps
and all the necessary IR trasnformations.
You can find the discussion here :