PAPI counters with basic matmul Relay function

Many thanks tkonolige, I think this is a good excuse for me to learn more about the internals of the TVM runtime and profiling.

I’ve started with making a simple C++ deployment of the matmul_add, with the goal of using it to implement Option 1.

I am following the basic structure of apps/howto_deploy (link) for my example.

Basically, I want to get it working in C++ before I try and make a nice Python wrapper, and all the layers of abstraction I’d need to break through.

I’ve been reading through the PAPI and Profiler code, and have already learned a lot. I see in the definition of the Profiler the example usage:

Device cpu, gpu;
Profiler prof({cpu, gpu});
my_gpu_kernel(); // do a warmup iteration
prof.Start();
prof.StartCall("my_gpu_kernel", gpu);
my_gpu_kernel();
prof.StopCall();
prof.StartCall("my_cpu_function", cpu);
my_cpu_function();
prof.StopCall();
prof.Stop();
std::cout << prof.Report << std::endl; // print profiling report

I am trying something similar, which might be the right way to go, using the PAPI collector as the metric collector:

tvm::Device dev = {kDLCPU, 0};
tvm::Map<tvm::runtime::profiling::DeviceWrapper, tvm::Array<tvm::String>> metrics({
   {kDLCPU,
    {"perf::CYCLES", "perf::STALLED-CYCLES-FRONTEND", "perf::STALLED-CYCLES-BACKEND",
     "perf::INSTRUCTIONS", "perf::CACHE-MISSES"}},
   {kDLCUDA, {"cuda:::event:elapsed_cycles_sm:device=0"}}});


tvm::runtime::profiling::MetricCollector papi_collector = tvm::runtime::profiling::CreatePAPIMetricCollector(metrics);

std::cout << "papi_collector created" << std::endl;

tvm::runtime::profiling::Profiler prof = tvm::runtime::profiling::Profiler({dev}, {papi_collector});
std::cout << "Profiler created" << std::endl;
f(A, B, C, out); // warmup
std::cout << "Warmup perfomed" << std::endl;
prof.Start();
prof.StartCall("matmul_add_dyn", dev);
f(A, B, C, out);
prof.StopCall();

My main issue right now is struggling with the initaliser of metrics, which CreatePAPIMetricCollector requires. It’s not clear to me how to get the typing right.

I can’t find anywhere else in the codebase that uses Map<DeviceWrapper, Array<String>>.

I have my code here, which can be cloned into tvm/apps, and run with ./run_example.sh. Compiling the PAPI example is make papi.

Any pointers on that line?