@merrymercy Good question! Here’s an example of TIR’s schedule.
s = tir.create_schedule(original_func)
update = s.get_block("C")
i, j, k = s.get_axes(update)
i_o, i_i = s.split(i, bn)
j_o, j_i = s.split(j, bn)
k_o, k_i = s.split(k, 4)
s.reorder(i_o, j_o, k_o, k_i, i_i, j_i)
TIR’s schedule is not totally stateless. Scope info, dependency graph info is actively maintained during the scheduling process in class Schedule. We don’t calculate them each time we apply a new primitive. After lowering to TIR without blocks, we don’t maintain these info any more since it is not schedulable.
All in all, it is good to run the benchmark to compare them in practice. I hope I understand your question correctly.