Currently a PrefetchNode
is replaced with a loop nest doing single cache line prefetches, and that’s done in the storage flattener. This is bad for Hexagon, because Hexagon has a prefetch engine which can prefetch a 2D buffer in the background, but it requires a different setup. Once the loop nest is generated, it’s too late for us.
What I’m trying to do is to replace PrefetchNode
with a new TVM builtin, which then can be legalized/lowered individually for each target, with a default lowering doing what storage flattener does now.
The problem is that the current legalization/lowering mechanisms won’t work for this, because legalization of the new prefetch builtin will require generating loops, which are statements, whereas the builtin lowering can only produce PrimExpr
s.
Would we consider extending legalization to handle statements?