PAPI counters with basic matmul Relay function

This is exactly what I needed, thanks!

I’m now able to extract the PAPI counters from standalone functions by running the function exported as an .so library in C++, with the above PAPI code!

I’ll use this method to get the data I need.

Now, looking forward, I’m thinking how best to expose a Python interface to this, to try and make this more usable for others in the short-to-medium term.

Within my C++ module, I benchmark using a PackedFunc. I can get the PackedFunc from the Python side mod (i.e. output of tvm.build) by running mod.entry_func.

I guess what I would need is a Python exposed C++ interface that takes a tvm.module, the input tensors, and the target device + PAPI counters.

Then it can just return the JSON from the tvm::runtime::profiling::Report.

I’ll need to think about the best place to build this. Should it be a method of Module, or would it be better to keep it separate somehow?

EDIT

I have shown in my example that the TVM profiling system, as well as the PAPI profiler, can work without running in the Relay VM (a system I have only just learned about - fascinating idea, though I wonder what sorts of overheads we can expect).

I’m looking to see if there is a standard way of using the profiler outside of the VM, that I could hook the PAPI profiler into.

However the only usage of the profiler I can find are in the PAPI tests themselves.

We can see a very simple function being profiled in the VM in this test, but it requires a Relay Function, and compilation in the VM.

Not really what I need, given I already have a tvm.runtime.packed_func.PackedFunc.

But perhaps I can take some design cues from VirtualMachineProfiler.