OK, I took a stab at it. Here are some properties I think are classical compiler virtues for a pass manager:
P1) Can run passes that run on functions in parallel across functions and with different amount of progress of the pass sequence for different functions.
P2) Supports cross-function optimizations with a good ordering for processing of functions when that matters (e.g. for inlining in a traditional compiler this is important).
P3) Offers excellent support in understanding and inspecting what each pass did to the IR, how long it took, how the size of the IR changes through the pass sequence, where changes happen, etc.
P4) Makes it easy to run an IR verifier between each pass where the IR changed or at specified times.
P5) Makes it easy to bisect a failing test case to the pass that makes it fail.
P6) Makes it easy to tell what the pass sequence being executed is.
P7) Makes it easy to experiment with different pass sequences, automatically and manually.
P8) Easy to understand, modify and use and with clear documentation.
P9) Has some support for understanding analysis versus mutating passes and updating versus recalculating analyses as the IR changes.
P10) Is efficient, i.e. allows fast compilation with minimal memory overhead.
P11) Probably more things I’m not thinking of right now.
The change I’m trying to make helps with P3. It would be good to design the API with a variety of these and any other concerns in mind, but it’s a bit beyond what I’m looking to do for this. So I’ll focus much more narrowly on printing/dumping IR, even though in a wider perspective probably all of this should be considered. Relevant features here include:
F1) Printing the IR, maybe only if it changes, maybe in full or as a diff.
F2) Showing a summary of compilation and what passes did (e.g. size of diff), without showing the IR itself.
F3) Dumping all manner of information, including IR, about each pass to a directory, one file per pass in files that are named with an incrementing pass number followed by the pass name. This allows focusing in on a relevant pass immediately and Unix tools like grep, wc and diff can be used. This has been useful in other compilers.
Both F1 and F2 require being able to do something both before and after a pass, be that knowing the IR before and after to make a diff, do other statistics, see whether the IR changed or measure how long the pass took to run. They also need to know what the passes that are run are, at least their names. F3 needs to know at each pass how many prior passes have run in order to number the files correctly, which is different from just knowing the IR before and after. It is also nice if there is support for saying why a pass is being run, since prerequisites involve running passes that were not in the original sequence of passes that were requested to be run.
Option O1: Handle dumping/printing with specific code in the pass manager
All of the information needed for printing/dumping as laid out above is readily available to the pass manager code and currently only the pass manager (sequantial) has this information. It is fairly straightforward to implement and test, as seen in the PR I sent. I think this is reasonable for right now.
Option O2: Add a pass manager API that just runs between passes and is passed the IR
Most of the above features can be implemented in this way by maintaining state in the functions (call them before() and after()) that run before/after a pass and it can be done with the Pass interface. To print the IR unconditionally before the first pass, the before() function can capture a boolean to say if this is the first time it is being run. It is not possible to determine exactly why a pass is being run, but it would be possible to detect nesting of passes if the before-function of a pass is called before running the before-function of prerequisites or nested passes, and then the before() function could keep a stack to figure out when passes are nested or prerequisites, though these two cases could not be distinguished (and this nesting probably doesn’t lead to the cleanest print-out). Determining whether the IR changed, or generating a diff of the change, can be done by storing the IR in the before() function on the side in a way that the after() can access it and compare to the IR after. Timing a pass can be done in the same way with a timestamp stored from within the before-function and retrieved in the after-function.
This mostly works but it’s awkward for the before() and after() functions to figure out what’s going on. The statefulness also is not as great for running the same sequence of passes again, since then one would have to make sure to reset or use fresh new state.
This probably gets a bit easier with having an object that contains the state and has a before-function, after-function and maybe a reset-function or first()-function, though that then gets closer to the next options.
Very unfortunately, this could not print the name of the pass, since you can’t tell that from just the IR. The function could accept both the IR and the pass as parameters, though then that again gets closer to the next options and then it wouldn’t fit the Pass interface. Of options 2 and 3 I think this variation is the most reasonable for right now, of accepting the IR and the pass.
Option O3: Add a Python API that provides detailed information to a Python function
This is similar to O2, but the functions are passed an object that contains information about compilation, such as why a pass is being run, how many passes have already run, whether this is a nested or prerequisite run of a pass, what the pass stack is currently, what the current pass is, whether the IR changed (for the after-function only), how long the pass took to run, what the IR was before the pass, whether this is before or after the current pass etc.
Essentially, what this is doing is exposing all of the data available inside the pass manager to also be available outside of the pass manager. There is a cost here that expensive information can be collected, e.g. what the previous IR was, which requires storing the previous IR, which can double memory usage if the IR is large, yet none of the before-functions or after-functions might need that information. Then you could imagine that there would be a way to configure what expensive information is available. This is getting to be a big API at that point, though. Instead one might say that the user of this API has to keep track of any expensive information and only cheap information is exposed.
(Also it would be nice if passes would say whether they can changed the IR, so then it’s cheap to see if the IR changed without having to store and compare against the previous IR and a debug mode could cross-check this.)
There are a lot of variations on these options, e.g. all 3 can be combined for different use cases. A reasonable combination of 2 and 3 might be something like having a way to insert a Pass between all other passes that then just gets the IR, or have a function that takes an object with a lot of information and can print all of that.
I would personally assume that 1 is going to be part of the picture in the end and that cross-cutting concerns like timing passes and dumping information will be a good fit for it, so it’s not necessary to go to 2 or 3 yet just to show information about the passes and the IR, but certainly this has to fit into the overall design goals of TVM and I’m not as much into the details of that at this time.