Many thanks tkonolige
, I think this is a good excuse for me to learn more about the internals of the TVM runtime and profiling.
I’ve started with making a simple C++ deployment of the matmul_add
, with the goal of using it to implement Option 1.
I am following the basic structure of apps/howto_deploy
(link) for my example.
Basically, I want to get it working in C++ before I try and make a nice Python wrapper, and all the layers of abstraction I’d need to break through.
I’ve been reading through the PAPI and Profiler code, and have already learned a lot. I see in the definition of the Profiler the example usage:
Device cpu, gpu;
Profiler prof({cpu, gpu});
my_gpu_kernel(); // do a warmup iteration
prof.Start();
prof.StartCall("my_gpu_kernel", gpu);
my_gpu_kernel();
prof.StopCall();
prof.StartCall("my_cpu_function", cpu);
my_cpu_function();
prof.StopCall();
prof.Stop();
std::cout << prof.Report << std::endl; // print profiling report
I am trying something similar, which might be the right way to go, using the PAPI collector as the metric collector:
tvm::Device dev = {kDLCPU, 0};
tvm::Map<tvm::runtime::profiling::DeviceWrapper, tvm::Array<tvm::String>> metrics({
{kDLCPU,
{"perf::CYCLES", "perf::STALLED-CYCLES-FRONTEND", "perf::STALLED-CYCLES-BACKEND",
"perf::INSTRUCTIONS", "perf::CACHE-MISSES"}},
{kDLCUDA, {"cuda:::event:elapsed_cycles_sm:device=0"}}});
tvm::runtime::profiling::MetricCollector papi_collector = tvm::runtime::profiling::CreatePAPIMetricCollector(metrics);
std::cout << "papi_collector created" << std::endl;
tvm::runtime::profiling::Profiler prof = tvm::runtime::profiling::Profiler({dev}, {papi_collector});
std::cout << "Profiler created" << std::endl;
f(A, B, C, out); // warmup
std::cout << "Warmup perfomed" << std::endl;
prof.Start();
prof.StartCall("matmul_add_dyn", dev);
f(A, B, C, out);
prof.StopCall();
My main issue right now is struggling with the initaliser of metrics
, which CreatePAPIMetricCollector
requires. It’s not clear to me how to get the typing right.
I can’t find anywhere else in the codebase that uses Map<DeviceWrapper, Array<String>>
.
I have my code here, which can be cloned into tvm/apps
, and run with ./run_example.sh
. Compiling the PAPI example is make papi
.
Any pointers on that line?