Handling of `prefetch` (legalization/lowering)

Currently a PrefetchNode is replaced with a loop nest doing single cache line prefetches, and that’s done in the storage flattener. This is bad for Hexagon, because Hexagon has a prefetch engine which can prefetch a 2D buffer in the background, but it requires a different setup. Once the loop nest is generated, it’s too late for us.

What I’m trying to do is to replace PrefetchNode with a new TVM builtin, which then can be legalized/lowered individually for each target, with a default lowering doing what storage flattener does now.

The problem is that the current legalization/lowering mechanisms won’t work for this, because legalization of the new prefetch builtin will require generating loops, which are statements, whereas the builtin lowering can only produce PrimExprs.

Would we consider extending legalization to handle statements?

I recently encountered similar issues. We can extent legalization/lowering to match this pattern Evaluate(call_intrin)) and lower them to Stmt