Loop dependent differential compute

  • Not sure the heading is appropriate, welcome to change appropriately.

Ref. to below sample snippet

for out from 0 to n:
     out_derived = fun1(out)
     for in from 0 to m:
           result = fun2(out_derived, in)

A general compute we write for this logic would be.

compute (indices [out, in]) {
     return fun2(fun1(out), in)
}

I know we have schedules to split the loops across parallel execution.

How about splitting compute across loop here ?
In above example fun1(out) compute is repeated for inner loop (m times) without any change in out.

Not sure if existing lowering process handle this optimization !!