I tried the naive thing:
@T.prim_func
def my_module(A: T.Buffer((24,), "float32"), B: T.Buffer((24,), "float32")):
T.func_attr({"from_legacy_te_schedule": True, "global_symbol": "my_module", "tir.noalias": True})
C = T.allocate([24], "float32", "global")
for i_outer in range(6):
C_1 = T.Buffer((24,), data=C)
A_1 = T.Buffer((24,), data=A.data)
B_1 = T.Buffer((24,), data=B.data)
C_1[i_outer * 4:i_outer * 4 + 4] = A_1[i_outer * 4:i_outer * 4 + 4] + B_1[i_outer * 4:i_outer * 4 + 4]
but that gave me an error:
TypeError: unsupported operand type(s) for +: 'BufferRegion' and 'BufferRegion'
By the way, the TVMScript above is a result of lowering TE, running some TIR passes on it and scripting the result:
type = "float32"
num = 24
A = te.placeholder((num,), dtype=type, name="A")
B = te.placeholder((num,), dtype=type, name="B")
C = te.compute((num,), lambda i: A[i] + B[i], name="C")
s = te.create_schedule([C.op])
x, = s[C].op.axis
xo, xi = s[C].split(x, 4)
s[C].vectorize(xi)
out = schedule_to_module(s, [A, B], "my_module")
out = tvm.tir.transform.StorageFlatten(64)(out) # to get rid of https://github.com/apache/tvm/issues/14342
out = tvm.tir.transform.VectorizeLoop()(out)
print(out.script())
Let me know what’s the correct way of expressing vectors in TVMScript!