ricann
November 21, 2018, 9:35am
1
there is a sentence of code in this example, I can’t understand, need help:
s[C_buf].reorder(
ko,
s[C_buf].op.axis[0],
s[C_buf].op.axis[1],
s[C_buf].op.axis[2],
s[C_buf].op.axis[3],
ki)
s[C_buf].tensorize(s[C_buf].op.axis[2], env.gemm)
two questions:
what is reorder used for?
what does tensorize do?
1 Like
eqy
November 21, 2018, 7:18pm
2
Reorder is used to permute the loop axes of a loop nest.
You may remember the popular CPU matrix multiply example used to introduce locality:
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++)
\\blah blah
The observation is that when optimizing for locality, it may make sense to reorder the loop nest, perhaps into something like:
or (i = 0; i < N; i++)
for (k = 0; j < N; j++)
for (j = 0; k < N; k++)
\\blah blah
The reorder
function applies this transformation, with the order being the order of the axes passed in as arguments.
Tensorize is analogous to vectorization but for more general dense data shapes.
For example, while vectorization may do something like change
for (i = 0; i < 64; i++)
C[i] = A[i] + B[i];
to
for (i = 0; i < 8; i += 8)
_my_8wide_vector_add(i, A, B, C); //operate on multiple elements at once
Tensorize can be used for a transformation like:
for (i = 0; i < 4; i++)
for (ii = 0 ; ii < 4; ii++) // 4x4 outer product outer loop
for (jj == 0; jj < 4; jj++) // 4x4 outer product inner loop
C[ii][jj] += A[i][ii] * B[jj][i];
to
for (i = 0; i < 4; i++)
_my_4x4outer_product_function(A, B, C, i);
2 Likes
ricann
November 22, 2018, 1:42am
3
thank you very very much
i have another question which may be not easy to answer, but i really don’t know how to do it, so i will be very happy if you can give me a little tips.
TVM has many complicated data structure, i don’t know how to read it(i have read about two months, but …), so i’m very confused now …
ricann
November 23, 2018, 10:07am
4
In file vta/tutorials/matrix_multiply_opt.py
line 245, there is:
s[res_gemm].reorder(ic_out, b_inn, oc_inn, ic_inn, b_tns, oc_tns, ic_tns)
from the code, whether it means s[res_gemm]
is 6 dimensions? if it is, I print s[res_gemm].op.axis
:
[iter_var(bo, Range(min=0, extent=1)), iter_var(co, Range(min=0, extent=64)), iter_var(bi, Range(min=0, extent=1)), iter_var(ci, Range(min=0, extent=16))]
it’s still 4 dimensions, why ?
thank you very much ~
nhynes
November 23, 2018, 11:00am
5
The axes marked as _tns
are converted into a single VTAUop
.
It’s kind of like how when you’re using a GPU, you parallelize the outer loops over all of the CUDA cores, so those loops go away. In VTA, the inner loops which would otherwise do e.g., multiplication, element-by-element, are converted into calls to the FPGA GEMM core which does those operations all at once.
1 Like