Thank you for such a valuable question.
Your understanding is correct. We still need a schedule language to schedule. That is because we need a simple API and abstraction for both human experts and automatical optimization (like AutoTVM, Ansor, and our new meta-schedule). Also, we try to keep user habbit, so we do not change API too much.
The critical challenge you are mentioned seems user experience especially loop axes. TensorIR is an eager schedule, which means every schedule primitive will change IR as soon as it exuecutes. Then, user can see axes and whole AST whenever they want, here is a simple example:
import tvm
from tvm import te, tir
A = te.placeholder((128, 128), name="A")
Update = te.compute((128, 128), lambda i, j: A[i, j] + 1, name="update")
"""Create PrimFunc from TE compute for further scheduling"""
func = te.create_func(Update)
"""Create TensorIR schedule"""
s = tir.create_schedule(func)
print(tvm.script.asscript(func))
"""Output
@tvm.script.tir
def func(var_A: ty.handle, var_update: ty.handle) -> None:
A = tir.match_buffer(var_A, [128, 128], elem_offset=0, align=128, offset_factor=1)
update = tir.match_buffer(var_update, [128, 128], elem_offset=0, align=128, offset_factor=1)
# body
with tir.block([], "root") as []:
tir.reads([])
tir.writes([])
for i0, i1 in tir.grid(128, 128):
with tir.block([128, 128], "update") as [i, j]:
tir.bind(i, i0)
tir.bind(j, i1)
tir.reads([A[i:(i + 1), j:(j + 1)]])
tir.writes([update[i:(i + 1), j:(j + 1)]])
update[i, j] = (A[i, j] + tir.float32(1))
"""
update = s.get_block("update")
x, y = s.get_axes(update)
print(x)
"""Output
for i0 = 0 to 128
"""
xo, xi = s.split(x, factor=32)
print(xo, xi, sep="\n")
"""Output
for i0_outer = 0 to 4
for i0_inner = 0 to 32
"""
print(x)
"""Output
(nullptr)
"""