Combining Separate tir.Block using compute_at()

junrushao · June 6, 2021, 8:17pm

The syntax below describes the signature of a block:

with tir.block([128, 128], "init") as [vi, vj]:

TensorIR is designed with the “block isolation” philosophy, and a block here describes a chunk of computation without needing context. When desugared, your particular example above expands to:

for i in range(0, 128):
  for j in range(0, 128):
    with tir.block([128, 128], "init") as [vi, vj]:
      # vi's domain is [0, 128), and it is data-parallel
      # vj's domain is [0, 128), and it is data-parallel
      tir.bind(vi, i)  # binds `i` to `vi`
      tir.bind(vi, j)  # binds `j` to `vj`
      tir.reads([])  # reads nothing
      tir.writes(C[vi : vi + 1, vj : vj + 1]) 
      C[vi, vj] = 0

The property of the block is that:

Instances of block execution are described with pair (vi, vj), where vi, vj \in [0, 128).
For a certain instance of a block (vi, vj), it doesn’t read anything, and writes to a buffer region C[vi : vi + 1, vj : vj + 1]
vi, vj are both data parallel, which means block instances (vi, vj) can be executed in arbitrary orders or in parallel

Block bindings (tir.bind) describe how those loops “drags” the block execution. It is possible that we execute in another order:

for i in range(0, 128):
  for j in range(0, 128):
    with tir.block([128, 128], "init") as [vi, vj]:
      tir.bind(vi, 127 - i)  # binds `127 - i` to `vi`
      tir.bind(vi, 127 - j)  # binds `127 - j` to `vj`

In short, in TensorIR, we decouple “in which order loop runs” and “the computation in the block body”. Therefore, over-complete information may occur (as you described) when the binding is trivial, and we provide syntactic sugars for this case.