Thank you for the in-depth explanation!
After TIR based scheduling was introduced, there was the introduction of Metaschedule, which was designed to work with TIR based scheduling and blocks, so if you move to TIR based scheduling, this might be the right thing to explore.
Is MetaSchedule something that can be expanded for custom hardware? That is the part that wasn’t clear to me also with regard to AutoSchedule and why I started with AutoTune.
I have one more question regarding TIR: How do you begin writing a schedule? For TE, you can define the computation on an abstract level as a reduction, but I can’t find any intro on how you work with the TIR scheduling. Everything seems to indicate that you have to manually write out the loop nests.