Can anyone help explain the tiling reduction axes part in the Tuning High Performance Convolution on NVIDIA GPUs? The code of this part is:
# tile reduction axes
n, f, y, x = s[OL].op.axis
rc, ry, rx = s[OL].op.reduce_axis
rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
ryo, rym, ryi = cfg['tile_rx'].apply(s, OL, ry)
rxo, rxm, rxi = cfg['tile_ry'].apply(s, OL, rx)
s[OL].reorder(rco, ryo, rxo, rcm, rym, rxm, rci, ryi, rxi, n, f, y, x)
s[AA].compute_at(s[OL], rxo)
s[WW].compute_at(s[OL], rxo)
s[AL].compute_at(s[OL], rxm)
s[WL].compute_at(s[OL], rxm)
-
Why we need to tile the reduction axes (what are reduction axes)?
-
Are
n, f, y, x
here the same as then, f, y, x
in
##### space definition begin #####
n, f, y, x = s[conv].op.axis
rc, ry, rx = s[conv].op.reduce_axis
or what is the relationship between them?
- Why do we need the
compute_at
here? I thinkAA
is read from global memory, so they do not need to be computed, right?