Meta-RFC
This RFC proposes that we introduce 3 new RFCs for handling errors in TVM. After talking with many stakeholders in the community and much of the issues we’ve had onboarding new people to TVM at OctoML I have identified three critical areas in which I believe we can improve TVM error reporting. I want to solicit help from the community in driving these three directions forward and coming up with a design that makes current and new contributors and users happy.
User visible errors vs. internal errors.
Currently the TVM codebase makes heavy use of the CHECK
family of macros in order to enforce and report errors. Although easy to use these provide a subpar user experience for most users showing large stack traces and often inscrutable errors. The first direction is introducing a separation between user visible errors, and internal errors.
It is my belief that most of our current use of check today should be migrated over to a family of macros which I will call ICHECK
for internal check. These macros (like other compilers) should provide a user friendly error message that an internal error has occurred in the compiler and they should report it to the discussion forum or issue tracker. We should leave “true” errors alone, and migrate them to the second or third part of my proposal, and over time remove the use of raw checks as our default error strategy.
For example see a Rust program which triggers and internal error.
Compiling playground v0.0.1 (/playground)
error[E0425]: cannot find value `rust` in this scope
--> src/main.rs:2:11
|
2 | break rust;
| ^^^^ not found in this scope
error[E0268]: `break` outside of a loop
--> src/main.rs:2:5
|
2 | break rust;
| ^^^^^^^^^^ cannot `break` outside of a loop
error: internal compiler error: It looks like you're trying to break rust; would you like some ICE?
note: the compiler expectedly panicked. this is a feature.
note: we would appreciate a joke overview: https://github.com/rust-lang/rust/issues/43162#issuecomment-320764675
note: rustc 1.44.1 (c7087fe00 2020-06-17) running on x86_64-unknown-linux-gnu
error: aborting due to 3 previous errors
Some errors have detailed explanations: E0268, E0425.
For more information about an error, try `rustc --explain E0268`.
error: could not compile `playground`.
To learn more, run the command again with --verbose.
I believe we should simply change the family of check macros to raise a new exception type which can be later caught and rendered appropriately.
Explicit and precise error reporting on program text
My second suggestion is we revive my previous efforts to provide precise program error reporting. I have begun work on this and will open a longer form RFC focused on this effort next. The goal of this effort is to bring traditional compiler error reporting to TVM, enabling line by line errors to be rendered with potentially more diagnostic information.
We will support two modes, one where source information is explicitly attached so that we may report errors back to Python or another input language, for use in frameworks such as MxNet or future tools. The other will use both our pretty printing and parsing infrastructure which I have been recently working on in order to produce a backing piece of source code to render errors against.
To make this useful we will need to convert much of the current error handling to instead use the new
diagnostic machinery, thus replacing most of the remaining check calls after first moving to ICHECK
.
My plan is to first do this for Relay, then TIR thus enabling this functionality for all of the unified IR.
Error handling callbacks in generated code.
My final suggestion is that we emit special symbols in the generated code which allows the functions to trap into more user friendly error handling when an in input invariant is not met, instead of inscrutable arg1.ndim does not equal n messages we can produce structured messages, or possible embed compile time information. This feature needs help being designed an implemented and if anyone is interested this would be a great starting point in TVM.
Next Steps
In order to drive this effort forward I would like to first collect a large variety of error messages across TVM, and draft some people for driving the RFCs for step 1 and 3 as this is a lot of work. My plan is to try and upstream machinery for doing 2 based on our discussion here, and then get help converting existing passes and errors to use this machinery.
Look forward to hearing everyone’s thoughts!
- Jared