In affined loops, linear detection is a very important analysis.
Consider a simple index expression like this: a*i + b*j + c*k + d
, the linear detection analysis is able to extract the coefficient of each loop variable: [a, b, c, d]
.
On the other hand, fusion is a very import prerequisite of parallelizing loops.
For example, when it comes to Conv
with size of H/W
, they are often fused and annotated with parallel
.
However, when these two comes together, fusion totally kills linear detection. Say if we fuse i
and j
, then we have (ij/jext)*a+(ij%jext)*b+c*k+d
(ij
is the fused inductive variable, and jext
is the old trip count of loop j
). Because of /
and %
the expression is no longer linear, some optimization relies on this feature will be disabled.
Is it possible to make something like postponed fusion — when making the very vanilla loop nests, loops to be fused are still themselves; after some transformation that requires linear detection is done, we fuse those loop levels.