- Not sure the heading is appropriate, welcome to change appropriately.
Ref. to below sample snippet
for out from 0 to n:
out_derived = fun1(out)
for in from 0 to m:
result = fun2(out_derived, in)
A general compute we write for this logic would be.
compute (indices [out, in]) {
return fun2(fun1(out), in)
}
I know we have schedules to split the loops across parallel execution.
How about splitting compute across loop here ?
In above example fun1(out) compute is repeated for inner loop (m times) without any change in out.
Not sure if existing lowering process handle this optimization !!