Error Reporting

See https://github.com/apache/incubator-tvm/pull/6274 for current status, hoping to merge soon ™.

FOR USERS ENCOUNTERING INTERNAL ERRORS

This document is intended to serve as a living document on the error reporting process of TVM, and provided in progress documentation on how to utilize the new error reporting machinery that I and other community members have been working on. My eventual goal is after some time we can crystalize these guidelines into the TVM documentation and redirect internal errors to the appropriate location.

If you have recently encountered an internal error check the discussion board for recent topics covering it, if you are unable to find a recent discussion, check the issue tracker and if there is no context, open a discussion thread, and once we understand the issue we can create some issues or open fixes to address the issue.

For more context on how the error systems of TVM work today read on below.

ICHECK* and friends.

The final PR in the initial set of error reporting refactors introduces a set of new ICHECK macros which mirror the current API for the CHECK macros but with an important difference. They all contain a header message which points users to investigate the issue by first reading about here in this thread (we can discuss another location in the future) and then follow the flow to either opening a discussion thread here or an issue on GitHub.

The goal is that currently we don’t differentiate between errors at all today and one way to make the current forums such as discuss, mailing lists, and GitHub higher signal to noise is better differentiating between user errors and internal errors.

For example in convolution.h we check that we are receiving the declared amount of input/output types in the type checking code.

template <typename AttrType>
bool Conv2DRel(const Array<Type>& types, int num_inputs, const Attrs& attrs,
               const TypeReporter& reporter) {
  ICHECK_EQ(types.size(), 3);

This is an appropriate place to use ICHECK as if the number of types pass here is wrong it signals an error in the internals of TVM that is not addressable by users.

The goal of these macros are to notify users about when there is a fault or bug in the system and allow the community to more successfully triage true issues.

In order to cut down on false positives for bugs we introduce the new diagnostic interface which I discuss below for highlighting errors due to user input, or erroneous use of the system components.

Diagnostics

My recent refactors have replaced the previous attempt at providing full-program error reporting with a targeted diagnostic interface which enables the rendering of targeted diagnostics for user programs.

For example this is now what errors look like when you provide convolution with improperly sized arguments.

The goal of my recent refactors are to provide TVM developers with the tools required to produce more attractive and informative error messages. At the time of first writing this is very early version of this, and I’m sure more features will continue to evolve as we move forward.

Below is an example of how to use the new diagnostics to customize the errors inside of an operator’s type checking. For example before in conv2d we would simply check an assertion like:

CHECK(reporter->AssertEQ(indexdiv(dshape_nchw[1], param->groups), wshape[1]))

You can with a few extra lines of code rewrite that error handling today:

  if(!reporter->AssertEQ(indexdiv(dshape_nchw[1], param->groups), wshape[1])) {
      reporter->GetDiagCtx().Emit(
        Diagnostic::Error(reporter->GetSpan())
          << "conv2d: requires that `" << indexdiv(dshape_nchw[1], param->groups) << "`,"
          << " the input channels (" << dshape_nchw[1] << ")"
          << " divided by groups (" << param->groups << ")" << ",\n must match the input channels"
          << " of the weight `"<< wshape[1] << "`, where the weight shape is (" << wshape << ").");
          Note(subpan) <<<
        return false;
    }

Notice that I have spent more lines of code then the assertion constructing a useful error message. I will try and update this document with more guidelines as we begin the push to rewrite all the old error handling to use the new machinery. A high level guiding principle should be “explain why, not just what”, errors should help users understand why what they did is wrong, not just that it is.

I have some more ideas for better structuring and improving errors as we move forward and will continue to iterate on this document.

If you are interesting in helping improve TVM error handling feel free to reach out to me.

4 Likes

I love the idea (and syntax & format) of the proposed diagnostic mechanism. It is beautiful and accurate.

I was wondering if it is possible to expand the scope of diagnostic. Right now, it works with Relay IR (IIUC), but I was thinking if it could be extended to

  • TIR hybrid script
  • The new TIR scheduling primitives
  • Target creation w/ tvm::Map and raw string

It is easier to enable working with Target creation, as long as we enable error reporting on generic tvm::Map and tvm::Array, we can accurately report which item causes trouble. Working with TIR hybrid script should be conceptually easy too, as the span information are naturally preserved in the parser (implemented by @spectrometerHBH). It is worth thinking the case working with scheduling primitives too.