Can we use therse two schedule methods to achieve example below? Example:
From
for i0, i1, i2 in T.grid(128, 128, 64):
with T.block("matmul"):
i, j, k = T.axis.remap("SSR", [i0, i1, i2])
C[i, j] = C[i, j] + A[i, k]*B[k, j]
To
for i0, i1, i2 in T.grid(128, 128, 64):
with T.block("name_of_read"):
i, j = T.axis.remap("SS", [i0, i1])
C_global[i, j] = C[i, j]
with T.block("matmul"):
i, j, k = T.axis.remap("SSR", [i0, i1, i2])
C_global[i, j] = C_global[i, j] + A[i, k]*B[k, j]
with T.block("name_of_write"):
i, j = T.axis.remap("SS", [i0, i1])
C[i, j] = C_global[i, j]