Thanks to the assistance of several core TIR contributors at OctoML, I have written a draft of a language specification for TIR. I present it here to the community for further review. If the community is interested, I hope the specification in some form can be adopted by the project and used as documentation for TIR, which would be useful for those seeking to learn TIR as well as contributors to the compiler stack. The draft can be found here; comments are welcome, especially where they concern 1.) inconsistencies, 2.) factual errors, 3.) omissions, 4.) ambiguities, or 5.) unclear or difficult language. This proposal also takes some views on what a good approach for specifying TIR would be; these are meant to be subject to debate, since most of them are my own opinions or those of the TIR developers who assisted me in writing the text.
TensorIR (TIR) is TVM’s lower-level intermediate representation (IR) for operations on tensors, allowing users to specify arithmetic operations on values that can be loaded from and stored into buffers. Many of TVM’s most valuable features for improving performance rely on TIR: in particular, scheduling transformations are defined on TIR programs and, similarly, TVM’s autotuning functionality (e.g., Metascheduler) also operates on TIR. Additionally, TIR is the simplest way to define a new tensor operator in TVM, which is often a necessary step for supporting the latest deep learning models.
Yet, despite the importance of TIR for the language, there is relatively little documentation on the language itself, which makes it difficult for new users to pick up the language. Some existing resources include Feng et al.'s recent ASPLOS paper on TIR and Tianqi Chen’s Machine Learing Compilation course. While these sources are valuable and provide an overview of TIR’s usage and utility, they do not (and are not intended to) provide a comprehensive review of the language and its features. The lack of a full language reference can be frustrating for new users, expert users, and compiler implementers alike:
- New users may not fully understand the utility of constructs that they encounter in example code, creating an impediment to their understanding of the language.
- Expert users may encounter edge cases where the intended functionality is unclear. This can lead to unease as to whether a program’s behavior in such a situation is a “hack” that relies on compiler implementation details or indeed intended behavior.
- Compiler maintainers may encounter situations where it is unclear whether implementing a pass or making some change could alter the behavior a user would expect. Similarly, it may not always be clear whether some surprising behavior is a “bug” or, in fact, intended. A clear specification would eliminate the ambiguity in such situations.
The goal of writing a language specification would be to address these problems by writing a document that achieves the following goal: Someone who is not otherwise familiar with TIR can refer only to the specification and be able to describe what any given TIR program is intended to do (without running it), assuming all validity conditions are met (something that will be discussed below). That is, with a language specification, it should be unambiguous what any given TIR program is intended to do (or, alternatively, whether the TIR program is doing something that should be considered unsupported).
Additionally, by creating a single document that describes the core language mechanics, the specification would have the further benefit of making it simpler to propose revisions to the language design, since they could be described in terms of the specification and what revisisions would need to be made to it. The implications of a change to the language on different language features would be easier to enumerate, which could highlight from an early stage any difficulties that might arise in implementing such a change.
As aforementioned, the draft specification can be found here. Review and comments on it are welcome. The document is intended to be self-contained, so it should not be necessary to say much more about how to read and approach it (though if it is difficult to understand, that would be useful feedback).
The document is written as a technical reference, intended to describe the behavior of each language construct in a precise manner. By contrast, the document is not meant as a tutorial, so it does not generally describe very much of the intent behind the design. Personally, I do not think a language specification is the right document for describing that, though if there are some places where a brief word on intent would be helpful, we could perhaps consider doing that.
Additionally, the specification is concerned only with the functional behavior of TIR programs. That is, it describes only the visible behavior that results from executing a program. This is elaborated upon in the draft itself, so I do not want to belabor this point in the pre-RFC, but the main implication is that the specification is not intended to make guarantees about performance and which constructs are likely to result in better performance. The reasons for this approach are twofold: First, it is difficult to make any a priori guarantees about performance (most of the time, we use heuristics) and, second, this simplifies the specification and thus gives the compiler implementation more freedom to make changes related to performance.
The draft as I have written it makes the decision to specify only a subset of programs that are possible to write in TIR, designating certain behaviors (some of which may be in use) as deliberately unspecified. This choice may prove controversial, but my collaborators and I found it to be a useful measure to address an ambiguous aspect of TIR’s design. In particular, TIR’s AST exposes many things that are typically thought of as compiler internals—even though they are “compiler internals” in some sense, however, expert users can and do manually set them to take advantage of properties of the compiler implementation or of specific devices. While it is useful for expert users to be able to do this, it poses some problems for the specification: accounting for all these low-level implementation details across all devices TIR targets would greatly expand the scope of the specification and, more importantly, including this behavior in the specification would essentially “commit” the compiler maintainers to supporting it, which could restrict their ability to change other compiler implementation details. (It would not be very helpful for the specification to be so tied to the compiler implementation that it essentially repeats what the implementation says; moreover, if changing small details of the implementation requires frequent revisions to the specification that break backwards compatibility, then the document is not very useful as a reference either.)
To summarize, TIR has historically made the distinction between a “front-end user” and a compiler implementer rather ambiguous (which is unusual among most programming languages). This specification instead aims to provide a distinction between “high-level” behavior that is guaranteed at TIR’s front-end versus lower-level details that are left to the compiler implementation. The decision to stick to a high level of abstraction, I hope, will allow the specification to be simpler and remain more stable over time and give room for defining some of the lower-level invariants for the compiler later. However, I do acknowledge that there is a tradeoff here between supporting functionality and the simplicity of the specification. I lean towards the latter in part because even the TIR maintainers who helped me write the draft did not always agree on which low-level implementation tricks “should” work in TIR—I think it is reasonable for the specification not to make guarantees in such cases.
As a working definition, this specification considers the following to be expected “front-end” users (as opposed to expert users who would be expected to make use of compiler internals):
- Users trying to straightforwardly implement a tensor operator to be optimized by the autoscheduler.
- Users of TE, which lowers into TIR.
- In the case of the ongoing Unity work, users who want to express tensor operators in TIR so they can be invoked directly in Relax.
The draft specification abstracts over certain implementation details regarding how buffers are laid out in memory as well as hardware-specific details. The following are the additional restrictions the draft specification imposes on input programs:
- Buffers may not alias each other (except in the case of
Blocknodes, which is specifically addressed in the draft).
- Each allocation creates exactly one buffer (by contrast, lower levels of compilation may use a single allocation to represent multiple buffers by changing the strides and element offset appropriately).
- Users must not manually specify the strides or element offsets for buffers.
- No buffer allocated in the body of a
PrimFuncmay be exposed externally.
- The program may not contain device-specific builtins or builtins that engage in pointer manipulation.
TIR programs that violate these rules are considered to be unspecified, which means that the specification makes no guarantees about their semantics. Otherwise, later stages of compilation are expected to violate these assumptions, but they are meant to be true of input programs. (It is not desirable for the TIR compiler to entirely reject input programs that do these things, but this behavior should left to expert users who are familiar with the compiler implementation and are prepared to update their code if the implementation details change.)
There are some areas of the text in which I lack confidence and request further review from those who are knowledgeable about these TIR constructs. They are listed below:
- Vectorized loops: In the compiler implementation, vectorized loops are lowered into a sequence of vectorized operations using the
VectorizeLooppass, whose implementation is generally very complicated. It would be undesirable for the specification to simply describe such a large and complicated pass in text form, since it would likely also be more difficult to understand in that form compared to the source code. Instead, it would be preferable to have a simple description of the high-level intent of the vectorized loop. I am not certain that the current description in the text (describing them as having the same semantics as a parallel loop) is entirely accurate given how the loop vectorization pass really works.
- The semantics of
BufferRealizewould be good to review closely, as I was not entirely certain of how those worked. Additionally, it would be good to add some description of how some of the extra information is used by the compiler (even if it is not part of the semantics per se).
- There are likely many additional implicit validity conditions for the different
Stmtnodes that are not noted in the type-checking rules in the current draft. Any omissions would be good to note.
- Another tricky area in the current draft is dealing with varialbes that could be contained inside
shapecould all contain variables and constructs featuring
Buffers should be careful to specify when these are assigned or not.
If the community agrees that a specification like this one or a revised version should be adopted, I think it should proceed to a formal RFC. This, however, still leaves the question of how the specification would actually be used by the community: the specification cannot be enforced by any mechanical means, so it would be incumbent upon users and maintainers of TIR to refer to the document and ensure the language behaves as described. It may be worth discussing how to ensure the specification is actually read; one way to do it might be link to it from prominent places in TVM’s documentation (e.g., from the documentation homepage) or make references to it in top-level header comments.
Additionally, we should determine the process for updating and maintaining the specification. The default option might be to require RFCs to update the specification and, accordingly, to have RFCs that add major functionality to TIR to also include how the specification would be revised as a part of the RFC process. My concern might be whether this would perhaps end up being cumbersome, though an out-of-date specification clearly would not be a very useful one. Another concern would be defining the distinction between a major revision to the specification like a new language feature and a minor one like correcting a small inconsistency—it would probably be reasonable to use a simple PR for minor changes.
If adopted, a high-level specification for TIR could serve as the first stage for potential other lines of work, like formalizing the notion of “dialects” in TIR that are used for different stages of compilation. Having specifications for each dialect would allow for clearly communicating the invariants expected at different stages of compilation, which would be helpful for future compiler development. (Such a project would, though, be of a larger scope and likely require reasoning about some of the low-level details that are elided in this draft.)
Another potentially useful project that the specification would enable would be the writing of a reference interpreter—that is, an interpreter that implements the behavior described in the specficiation in the most straightforward manner possible, with no eye towards optimization. A reference interpreter would serve as an “executable” specification and allow for directly checking whether the compiler follows the expectations of the specification by comparing outputs with the reference interpreter.
While I wrote most of the text, the initial draft would not have been possible without the efforts of Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg), and Junru Shao (@junrushao). I am grateful to them for patiently answering my questions about the language and giving their input on many language features. Additionally, I thank Christian Convey (@cconvey), Denise Kutnick (@denise), and Prakalp Srivastava (@psrivas2) for many insightful comments during the process of writing the initial draft.