How to let matrix multply computation output result store in noncontinuous memory by TE language?

gfvvz · February 23, 2021, 5:35am

I am wonder if there is a method to let tvm.te.compute() result store in a noncontinuous memory?

I need to implement one matrix multply, for example, matrix A *B， and output C, and want to let C store in this way, lets represent original C’s element by C[i][j], want to store in this way:

C[0][0], 0, C[0][1], 0, C[0][2], … C[0][N-1], 0

C[1][0], 0, C[1][1], 0, C[1][2], … C[1][N-1], 0

…

C[M-2][0], 0, C[M-2][1], 0, … C[M-2][N-1], 0

C[M-1][0], 0, C[M-1][1], 0,… C[M-1][N-1], 0

I want to use one step to do both calculation and output layout like above, is it possible to implement in TVM TE language?

For C language, it is very simple, we can simply assign the output address in stride way, but I am not sure how to express in tvm.te.sum(), if you know, let me know, thanks a lot.