Is there a language construct to explicitly map operators to compute blocks of heterogeneous architectures (as shown in the Figure and code example below)? The memory mapping shown in blue could be done with .scope ? However, I’m unaware of a mechanism to map compute to individual hardware blocks at TensorIR or TE level, as shown by the yellow lines.
Example code:
import tvm
from tvm import te
n = te.var("n")
m = te.var("m")
A = te.placeholder((m, n), name="A")
B = te.placeholder((m, n), name="B")
# These are toy examples. Imagine the computes as pipelined and running for \approx 1 ms
C = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name="C")
D = te.compute((m, n), lambda i, j: C[i, j] + 1, name="D")
E = te.compute((m, n), lambda i, j: te.exp(C[i, j]), name="E")
F = te.compute((m, n), lambda i, j: D[i, j] + E[i, j], name="F")
s = te.create_schedule([F.op])
# BEGIN pseudo code
# Is there a way to map each operator a specific HW?
s[C].map_to = cpu1
s[D].map_to = accelerator1
s[E].map_to = cpu2
s[F].map_to = accelerator2
# I know that VTA uses *.pragma* for a similar issue. But doesn't this typical case
# deserve a language construct for itself?
s[A].set_scope(Mem2)
s[B].set_scope(Mem1)
s[C].set_scope(Mem3)
s[D].set_scope(Mem1)
s[E].set_scope(Mem2)
s[F].set_scope(Mem3)
# END pseudo code