Explanation of autoscheduler transform steps

Wheest · March 24, 2021, 6:00pm

I am trying to understand more about the auto-scheduler, and the output from the Ansor tuning process.

If I examine the output JSON from tuning, I see a series of transformation that have been learned as an autoschedule, e.g.:

["SP", 2, 24, 3, [3], 1],
["RE", 2, [0, 4, 8, 12, 16, 1, 5, 9, 13, 17, 20, 22, 24, 2, 6, 10, 14, 18, 21, 23, 25, 3, 7, 11, 15, 19]],
["FSP", 3, 0, 1, 2],

I am trying to understand what each of these transformations do, and what their relatied parameters are.

I can’t find them explicitly mentioned in the Ansor paper, though they are enumerated in include/tvm/auto_scheduler/transform_step.h.

I have compiled what I think is a complete list.

AN: AnnotationStep
FU: FuseStep
PR: PragmaStep
RE: ReorderStep
SP: SplitStep
FSP: FollowSplitStep
FFSP: FollowFusedSplitStep
SA: StorageAlignStep
CA: ComputeAtStep
CI: ComputeInlineStep
CR: ComputeRootStep
CHR: CacheReadStep
CHW: CacheWriteStep
RF: RfactorStep

However, there does not seem to be clear high-level documentation on their purpose, parameters, or examples of their advantages.

Are there resources available that could help me understand, or could some of the developers comment? Has it been discussed in the various PRs?

I am interested enough to make an effort of improving the documentation myself, but I would rather have a starting point beyond looking at the code.

merrymercy · March 24, 2021, 7:02pm

The output JSON file stores serialized measurement records, so we can extract the best schedules and re-apply them again during compilation. One measurement record also contains a complete schedule, which consists of the transform steps you mentioned. This json format is designed to be compact (to save disk space) and readable (to make debugging easier).

Your compiled list is correct. Actually, this list is defined here (tvm/transform_step.cc at cfe2e288a331b10e72e10c7e465df375b44e6ae9 · apache/tvm · GitHub), The JSON serialization format of a step is an array (name, args...). When deserializing the json, we parse the name and then dispatch according to the name.
Take “SP” (SplitStep) as an example, we parse the name and go to this branch. It then goes to this reader function. From the code, we can infer the format is ("SP", stage_id, iter_id, loop_extent, lengths, inner_to_outer). Correspondingly, this writer function defines how to serialize a SplitStep to JSON. You can find it defines the format more clearly. Similarly, every step has its own reader function and writer function.

If you want to improve the doc, you are welcome to contribute a doc describing the format and how it is generated. See also this related discussion: Interpretation of measurement records when Auto-scheduling

Lizhi-Liao · April 12, 2021, 3:45am

Hi~ Is there any material (e.g., paper, doc, or article) that briefly introduces each of the transformation steps and its parameters, like a definition or how it works? Thanks for your concern!

Lizhi-Liao · April 16, 2021, 4:14am

Can someone shed some light on this, please

merrymercy · April 24, 2021, 8:52pm

They basically follow the semantics of tvm’s schedule primitives.

https://tvm.apache.org/docs/tutorials/language/schedule_primitives.html