Hi, I’m confused about the vectorize schedule. Two questions:
-
Is it just a hint for TVM (i.e., it is fine for TVM to not follow it), or it’s something that will be strictly enforced?
-
What’s the exact meaning of vectorizing a loop? I thought it was partitioning the loop into multiple vector load/stores. But when I try to vectorize a loop of 1000, the script below only worked for LLVM target but failed on CUDA. The error message is
TVMError: Cannot convert type int32x1000 to CUDA type
, which sounds like it was trying to convert the whole loop into one vector load/store?
import tvm
from tvm import te
import numpy as np
# tgt = tvm.target.Target(target="cuda", host="llvm") # failed: TVMError: Cannot convert type int32x1000 to CUDA type
tgt = tvm.target.Target(target="llvm", host="llvm") # succeeded
dev = tvm.device(tgt.kind.name, 0)
# get te
N = 30000
A = te.compute((N,), lambda i: i, name="A")
s = te.create_schedule(A.op)
oi, ii = s[A].split(A.op.axis[0], factor=1000)
if tgt.kind.name == 'cuda': s[A].bind(oi, te.thread_axis("threadIdx.x"))
s[A].vectorize(ii)
# run
print(tvm.lower(s, [A], simple_mode=True))
foo = tvm.build(s, [A], tgt, name="foo")
dev = tvm.cuda() if tgt.kind.name == 'cuda' else tvm.cpu()
a = tvm.nd.array(np.zeros(N,).astype(A.dtype), dev)
foo(a)