Explain the tiling reduction axes part in template

puddingfjz · April 12, 2020, 6:45pm

Can anyone help explain the tiling reduction axes part in the Tuning High Performance Convolution on NVIDIA GPUs? The code of this part is:

# tile reduction axes
    n, f, y, x = s[OL].op.axis
    rc, ry, rx = s[OL].op.reduce_axis
    rco, rcm, rci = cfg['tile_rc'].apply(s, OL, rc)
    ryo, rym, ryi = cfg['tile_rx'].apply(s, OL, ry)
    rxo, rxm, rxi = cfg['tile_ry'].apply(s, OL, rx)
    s[OL].reorder(rco, ryo, rxo, rcm, rym, rxm, rci, ryi, rxi, n, f, y, x)

    s[AA].compute_at(s[OL], rxo)
    s[WW].compute_at(s[OL], rxo)
    s[AL].compute_at(s[OL], rxm)
    s[WL].compute_at(s[OL], rxm)

Why we need to tile the reduction axes (what are reduction axes)?
Are n, f, y, x here the same as the n, f, y, x in

##### space definition begin #####
    n, f, y, x = s[conv].op.axis
    rc, ry, rx = s[conv].op.reduce_axis

or what is the relationship between them?

Why do we need the compute_at here? I think AA is read from global memory, so they do not need to be computed, right?