[pre-RFC][CRT][microTVM] Integration of platform-specific counters (or any data)

Introduction

I have stumbled across a use case which could be an interesting addition to the CRT runtime.

Consider the following scenario:

  • A host is running AutoTVM, executing the generated code in another platform thanks to microTVM and RPC (for example, an ARM or RISCV development board).
  • Suppose this platform has an accelerator which provides specific custom performance counters (or any custom data) which gives more information about the execution of the model than just the average execution time, as measured nowadays by TVM.

I would like to find a way of integrating this custom data into TVM, so that I can retrieve the performance counters values for each run from the host side.

Proposal

I was looking where to add this information, in order to make it generic enough so that any custom data can be added. So far I was able to find that the following changes should be done:

  1. Inside the RuntimeEvaluator of the CRT runtime api, a platform-specific function (similar to TVMPlatformTimerStart) should be added to integrate all the data that needs to be returned to the TVM host.
  2. I also understand that a change should be done in the HandleNormalCallFunc function of the minrpc, in order to actually return this values to the host.
  3. In the Python function run_through_rpc, the returned object of class MeasureResult should also be modified.
  4. The MeasureResult class should probably be modified to support this custom data (perhaps add an array of dictionaries to hold the custom data?)

Hi @fPecc,

Profiling supports adding custom performance counters via the MetricCollector interface (https://github.com/apache/tvm/blob/main/include/tvm/runtime/profiling.h#L270-L313). If you could use the profiler, then you can use graph_executor.profile(*args, collectors=[MyNewCollector]) where MyNewCollector is your performance counter measurer that uses the MetricCollector interface (you can use tvm.runtime.profiling.profile_function if you just want to time a single function). However, profiling doesn’t currently support the CRT or using MetricCollectors over RPC. I’d love to see the profiler support both though.

In terms of your proposal, I’d prefer to see the profiler used instead of changing the time_evaluator interface as there profiler is a place where we are already collecting similar data to what you want. Also, the time_evaluator is used in a lot of places in the codebase and changing it may be hard. If we do want to use the profiler, then we have a couple of problems we need to solve:

  1. How do we make the metric collection interface work over RPC? I think we would need to decouple the collection of metrics from their interpretation. The on-device side of the metric collector would record the metrics and send them back as a binary blob. Then the host side would interpret the results. The on-device side would probably have to be just packed functions in the runtime so that they are callable over RPC.
  2. How can we adapt the profiler to work with the CRT? The profiler has a fair bit of c++ code and I am not sure we really want to port it over to C. Maybe someone with more CRT experience can speak here? @areusch

Given that the above two points may be a bit of work, a medium-term solution may be to add a custom function that does time_evaluator + collecting metrics. Once again, returning structured data over RPC is hard, so I’d recommend that this new function return a binary blob that the host then decodes.

Also, if you have questions about the profiling interface, I am happy to answer them :slight_smile:

1 Like

Hi @tkonolige ,

I was not aware of the MetricCollector, surely this could be a way of implementing what I need.

I agree, from what I have been reading in the code, the easiest way of returning data over RPC is to return a binary blob and let the host do the decoding.