How to both cache_read and cache_write on same buffer in TIR?

Can we use therse two schedule methods to achieve example below? Example:

From

for i0, i1, i2 in T.grid(128, 128, 64):
    with T.block("matmul"):
        i, j, k = T.axis.remap("SSR", [i0, i1, i2])
        C[i, j] = C[i, j] + A[i, k]*B[k, j]

To

for i0, i1, i2 in T.grid(128, 128, 64):
    with T.block("name_of_read"):
        i, j = T.axis.remap("SS", [i0, i1])
        C_global[i, j] = C[i, j]
    with T.block("matmul"):
        i, j, k = T.axis.remap("SSR", [i0, i1, i2])
        C_global[i, j] = C_global[i, j] + A[i, k]*B[k, j]
    with T.block("name_of_write"):
        i, j = T.axis.remap("SS", [i0, i1])
        C[i, j] = C_global[i, j]