[RFC] Better tvm logger on both C++ and Python side

pzq · December 19, 2021, 9:19am

For now, tvm uses the dmlc log in dmlc-core by default, which lacks good log level management. Below is some underperformance in tvm.

LOG(INFO) shares the same severity with LOG(WARNING) and LOG(ERROR),
LOG(DEBUG) does not exist. Instead, tvm contains a macro TVM_LOG_DEBUG and an environment DMLC_LOG_DEBUG as a log trigger to print the debugging messages, which is quite confusing for users,
Tvm is supposed to have an env TVM_BACKTRACE as discussed in https://github.com/apache/tvm/issues/9567, but this environment is not working and can not be found anywhere in C++ code.

Although tvm allows users to define or pass custom loggers to manage logs on the C++ side, the default dmlc logger is still the most commonly used logger. Thus, it is necessary to imporve the current tvm logger.

In this RFC, I suggest to do the following:

add different log severity in dmlc-core repo,
provide TVM_LOG_LEVEL as an environment variable to unify these env variables or macros, like, DMLC_LOG_DEBUG, TVM_BACKTRACE and TVM_LOG_STACK_TRACE.

cc @junrushao

leandron · December 19, 2021, 9:50am

I think this is a great initiative @pzq. The only suggestion I’d put on top of it, is that we also can make the Python side to - generally - respect the same rules, environment variables, etc. Just to keep in mind that we can offer a better experience for our users if these rules apply more widely in TVM.

Happy to help exploring these ideas.

junrushao · December 19, 2021, 8:30pm

This is definitely extremely important! I would love to further discuss with you on the environment variables used here. Do you think it’s possible to unify DMLC_ and TVM_ environment variables? Is it possible to minimize the number of env variables to use? Is there any precedent we could refer to?

comaniac · December 19, 2021, 9:11pm

I like this proposal a lot. Here are my two cents in terms of use cases.

In the C++ side, PyTorch has two logging systems:

LOG(LEVEL): This is what we familiar with, including DEBUG, INFO, WARNING, and FATAL.
VLOG(INT_LEVEL): This is only for debugging, INT_LEVEL could range from 0 to 10. They use the following way in Python to control the level:

from caffe2.python import workspace
workspace.GlobalInit(['caffe2', '--caffe2_log_level=-4'])

I personally like the idea of this system because a single level of DEBUG is too broad. I could imagine that I will get lost in the log sea when executing with DEBUG. Meanwhile, even we provide 10 levels of debug logging, it is still confusing for both developers and debuggers about what level to put and what level to set.

Ideally we might combine all good ideas as follows. Note that this is just an example.

DLOG(FuseOps)(1) << "message"

DLOG stands for “debug log”.
FuseOps indicates the log type to differentiate other logs. Note that this is also understand a namespace relay.transform.
1 is the debug log level ranging from 1 to 5.
This log can be enabled by setting either environment variable or Python global logging. In this example, we may use set_debug_log_level("relay.transform.FuseOps", 1) to enable it.
Similarity, we may use set_debug_log_level("relay.transform", 1) to enable all debug logs under a higher level of namespace.

On the Python side, as @leandron pointed out, we should also make it consistent as possible. Currently, most Python logging are directly by the global logging (i.e., logging.info("message")). This would be an issue especially working with other Python packages. Specifically, if another Python package with its own custom logging, the custom settings might be messed up when importing tvm.

junrushao · December 19, 2021, 11:11pm

Another idea @zxybazh and I have been discussing is the possibility to unify Python and CXX logging system via packed functions, i.e. the CXX logging system could potentially call back to python’s logging module

pzq · December 20, 2021, 3:45am

It’s a good idea to unify Python and CXX logging system via packed functions, but I am worried that it may affect the performance of C++ runtime if unified to python’s logging module.

pzq · December 20, 2021, 3:59am

I would say this is far beyond my initial thoughts, but I am in total agreement with your proposal. It’s just that there are still many problems in the details, I need some time to figure it out and take further discuss with you.

pzq · December 20, 2021, 4:50am

In order to minimize the number of env variables and to avoid ambiguity, how about we provide just two variables, TVM_LOG_LEVEL and TVM_DEBUG_LOG_LEVEL, and the existing DMLC_ and TVM_BACKTRACE ones should be deprecated.
As @comaniac said,

The TVM_LOG_LEVEL is for the basic logging level, including DEBUG, INFO, WARNING, and FATAL.
The other TVM_DEBUG_LOG_LEVEL is for further division of debugging messages, and can be ranged from 1 to 5.

As for env TVM_BACKTRACE and macro TVM_LOG_STACK_TRACE, I think they can be unified to the TVM_DEBUG_LOG_LEVEL.

Request for comments. @junrushao @comaniac @leandron

comaniac · December 20, 2021, 5:40pm

Please correct me if I missed anything. For TVM_BACKTRACE, isn’t it used when crashing? If so then it seems to work with FATAL?

junrushao · December 20, 2021, 6:56pm

@pzq @comaniac The precedent of TVM_BACKTRACE is RUST_BACKTRACE, which is used for displaying the backtrace when rust program crashes. right now, TVM_BACKTRACE is only used for TVM’s DiagnosticError, which only happens in the parser (@wrongtest correct me if I was wrong)

We might want to expand it’s usage to control the stacktrace generation in the entire TVM system. Particularly, the stacktrace isn’t much helpful for non-core developers, and looks a bit of messy when displaying in Jupyter notebook

wrongtest · December 21, 2021, 12:58am

TVM’s diagnostics (context, renderer and errors) are introduced by https://github.com/apache/tvm/pull/6274, where render with errors will also create DiagnosticError. A typical user is InferType pass.

I think it is a better weapon than logging system for developers and internal mechanism debugging. With diagnostics we may be able to decouple what to record (with traceback and context information), when to raise the error and how to format them. We can also see the diagnostic design in clang (clang: include/clang/Basic/Diagnostic.h Source File) and MLIR (Diagnostic Infrastructure - MLIR).

tkonolige · December 21, 2021, 4:36pm

@pzq These sound like good improvements to the TVM logging system.

I would like to provide some clarification. In https://github.com/apache/tvm/pull/7153, all the logging infrastructure was moved into the TVM source tree. We no longer use dmlc-core for logging. We do implement an interface that is compatible with dmlc-core’s logging so that all logging in dmlc-core goes through the TVM logging system.

@comaniac I believe we already have VLOG in the codebase: https://github.com/apache/tvm/pull/9012.

comaniac · December 21, 2021, 5:21pm

@tkonolige thanks for pointing out. Glad to realize that we already have it, but seems like not many passes use VLOG yet, nor not many people know about this. Maybe a tutorial or guide could help promotion.

areusch · December 21, 2021, 6:50pm

+1 for this. I’ve been thinking a lot about this but haven’t had time to write up any thoughts. Here are some sketches:

We often spawn subprocesses to do things like paralleizing AutoTVM and because it’s often very convenient to be able to kill a subprocess which was talking to a GPU driver that went haywire.
- When we do this, we sometimes either a) repurpose stdout/stderr for use as a command-reply stream between the parent and child processes or b) don’t configure Python logging in the child process. (We used to use multiprocessing for this which did one thing and now I think we spawn child processes which now do a different thing, and I can’t remember offhand which thing we do now. Complicating things I think we support both).
- This often means that either Python logs are seen (e.g. because multiprocessing was used) or that C++ logs are seen (because we didn’t configure Python logging).
- Unrelated but also confusing: sometimes these child processes are manually launched, and since we have little C++ logging, you often find yourself adding logging, rebuilding, and then forgetting to relaunch the child process, so that you don’t see logs when you should.
Because of all of these reasons, I’m not opposed to using PackedFunc to log (or even potentially to forward logs over the RPC link like we do with microTVM), but it does mean that if we want to achieve an efficient implementation with logging off, we would need to:
1. Adopt the same hierarchical module-level logger names in C++ as we do in Python.
2. Provide a way for Python to tell C++ which hierarchical modules have logging enabled, so that the check for “should we log this?” can be done entirely in C++.
3. (probably) re-implement the hierarchical traversal done in Python logging in C++ as part of doing that check.
4. Define a cross-platform PackedFunc that could be used to receive log calls, and a fallback method to be used when that’s not configured.
There are some benefits to pushing logs into the frontend language (e.g. Python)–for example, I believe multiprocessing provides some special logging facilities.
There could be some reasons to push this the other way (e.g. forwarding logs over the RPC link), but I’m sort of in favor of separating those implementations as it seems burdensome to require frontend languages to output logs using TVM’s C++ implementation.

driazati · January 8, 2022, 12:07am

Spiritually related: https://github.com/apache/tvm/pull/9872 adds colors and some extra options for backtraces.

Having all the logging go through a single place where it gets printed is certainly nice for filtering and understandability in general. Logging via a hook that is provided by the runtime gets a lot of these benefits (i.e. when Python is present it can install itself using the normal logging module to print messages, or C++ can have a std::cout printer or just do nothing in the default logging-off case, PyTorch does something like this for warnings (hook, installation)