Here is my code:
@T.prim_func
def func(A:T.handle,B:T.handle, C:T.handle):
a = T.match_buffer(A,shape=(16,), dtype="float32")
b = T.match_buffer(B,shape=(16,), dtype="float32")
c = T.match_buffer(C,shape=(16,), dtype="float32")
for i in range(16//4):
va = a.vload(i*4,"float32x4")
vb = b.vload(i*4,"float32x4")
vc = va + vb
# c[i*4:i*4+4]=vc
c.vstore(i*4,vc)
If I run
func.show()
Here is the output:
@T.prim_func
def main(a: T.Buffer((16,), "float32"), b: T.Buffer((16,), "float32"), c: T.Buffer((16,), "float32")):
for i in range(4):
va: T.float32x4 = a[i * 4:i * 4 + 4]
vb: T.float32x4 = b[i * 4:i * 4 + 4]
vc: T.float32x4 = va + vb
T.evaluate(0)
We can see that the statement c.vstore is not in the body of prim_func!!!
Of course, we can write in another way like c[i4:i4+4]=vc.
My question is: if I use tir.buffer.vload in script, how to handle the problem of “vstore stmt not in prim_func’s body”?