We occasionally run into strange crashes that are hard to debug. Unfortunately, the verbosity of logs on Hexagon makes them difficult to wade through, and sometimes causes log data to be dropped. For example, here’s a trace from a recent crash: Example TVM on Hexagon crash · GitHub
There’s several problems here. First, there is a lot of RPC debugging messages. The Hexagon RPC mechanism has been in use for quite a while now, so we probably don’t need in-depth logging most of the time. It also drowns out finding useful messages, and might even cause the logging buffers to fill up, causing data to be dropped. For example, in the last line of that gist, ./libhexagon_rpc.so
is truncated as ./libhexagon_r
and overwritten with dashes.
Another issue is that Hexagon’s FARF (logging) API automatically adds line numbers. When forwarding TVM logs to FARF, this results in overly verbose messages, with FARF showing the line number where custom Hexagon logging is implemented (in hexagon_common.cc
), while the actual log gets split into multiple FARF log lines – one for the actual line number, and one for the actual message.
You can see this a lot in the gist, but I’ll repeat a couple of lines here:
10-10 23:07:27.397 10835 10839 V adsprpc : CDSP:/home/csullivan/projects/tvm/src/runtime/hexagon/hexagon_common.cc:73:0x32167:25605: INFO: /home/csullivan/projects/tvm/src/runtime/hexagon/rpc/hexagon/rpc_server.cc:77:
10-10 23:07:27.397 10835 10839 V adsprpc : CDSP:/home/csullivan/projects/tvm/src/runtime/hexagon/hexagon_common.cc:76:0x32167:25605: INFO: HexagonIOHandler PosixWrite called, write_len_bytes(4)
Finally, the current Hexagon logging implementation uses HEXAGON_PRINT(ALWAYS, ...
regardless of the TVM LOG()
level, making it impossible to filter out lower-level messages. This also presumably has a big impact on performance.
Unfortunately, LogMessageImpl
doesn’t currently take the TVM log level, making it impossible to forward to the Hexagon logging framework. I have a draft PR that extends LogMessageImpl
to take the TVM log level: https://github.com/apache/tvm/pull/13072 However, this seems like a big enough change to get some feedback on.
While working through this PR, several things came to mind:
-
Does it still make sense to have a separate
LogMessage
andLogFatal
?LogFatal
usually ends up doing the same thingLogMessage
does, followed by something like anabort()
. Now thatLogMessage
takes a level, it seems like these functions can be merged. -
TVM doesn’t really have log levels per se…
INFO
,WARNING
,ERROR
, andFATAL
are just used to construct the string of the macro to call (e.g., (#define LOG(level) LOG_##level
, followed by a bunch of differentLOG_*
macros that mostly do the same thing. These redundant macros could be removed if these levels actually had values to pass intoLogMessage
. -
Meanwhile, metaschedule does have explicit log levels for
PyLogMessage
, which includes the extraDEBUG
level, which is standard in Python but does not exist elsewhere in TVM at the moment. (There’s also theCLEAR
level, which seems to be a hack, so I’m ignoring it for now.) -
Each platform has its own logging levels. For example, Hexagon has
LOW
,MEDIUM
,HIGH
,ERROR
,FATAL
, andALWAYS
. Android hasVERBOSE
,DEBUG
,INFO
,INFO
,WARN
,ERROR
, andFATAL
. How do we reconcile these? It seems like mapping HexagonLOW
toDEBUG
mostly keeps everything within a reasonable common range betweenLOW
andFATAL
.
The PR also uses the Hexagon HAP_debug_runtime
API, and while the headers say you should use the FARF macros instead, it seems appropriate to use this lower-level API when combining integrating Hexagon logging with another logging framework. This enables you to adjust the desired logging level at runtime. According to the Hexagon SDK, runtime log filtering has negligible overhead for discarded messages. With this change, I made Hexagon RPC log at the DEBUG
level and disabled collection of those logs by adjusting the flags in tvm_rpc_android.farf
. The logs are now MUCH nicer. For example:
04-22 19:04:16.232 20684 20688 V adsprpc : rpc_server.cc:199:0xd379:8709: ERROR: ProcessOnePacket failed
04-22 19:04:16.233 20684 20688 V adsprpc : rpc_server.cc:290:0xd379:8709: ERROR: hexagon_rpc_send failed, written_size (-1) != dataLen (12)
I also took this opportunity to pipe the TVM log level to Android. Previously, TVM on Android used to log everything in logcat at the DEBUG
level. Now the TVM log level is mapped to the corresponding Android log level and sent to logcat.
Thoughts on this approach?