Can compute op be constructed differently in matrix multiply?

Normally, a matrix multiply op can be constructed as follow:

import tvm

N,M,K = 512,256,128
dtype = "float32"
A = tvm.placeholder((N,K), name="A", dtype=dtype)
B = tvm.placeholder((K,M), name="B", dtype=dtype)

k = tvm.reduce_axis((0, K), name="K")
C = tvm.compute((N,M), lambda i,j: tvm.sum(A[i, k] * B[k, j], axis=k), name="C")
s = tvm.create_schedule(C.op)

and it means:

produce C {
  for (i, 0, 512) {
    for (j, 0, 256) {
      C[((i*256) + j)] = 0.000000f
      for (K, 0, 128) {
        C[((i*256) + j)] = (C[((i*256) + j)] + (A[((i*128) + K)]*B[((K*256) + j)]))
      }
    }
  }
}

The above example define the compute op as the definition of the matrix multiply.But I wonder is it possible that tvm can construct the compute op in fancier way?
e.g: every time, I pick one column of A and one row of B, and do multiply , which get a N*M matrix, then add those K matrix and get the final output. In Python, it can mean:

import numpy as np
N, M, K = 3, 2, 2
A = np.random.rand(N, K)
B = np.random.rand(K, M)
C = np.zeros((N, M))
# do matrix multiply as follows
# pick one column of A, one row of B
for i in range(K):
    for p in range(N):
        for q in range(M):
            C[p, q] = C[p, q] + A[p, i] * B[i, q]

So,how can I define the above compute op in tvm.compute?
Thanks!