Add Evaluators to Debug Executor

max1996 · July 22, 2021, 7:58am

I made some progress yesterday by changing this line tvm/papi.cc at main · apache/tvm · GitHub to component_name=“nvml”.

However, every metric that gets reported by PAPI & NVML is 0, despite NVML working properly in the PAPI test cases and the nvidia-smi tool. I am not sure, where or why this error occurs.

By going through the process with gdb, I can see that PAPI seems to be collecting the correct data, but if I want to output it in TVM, it is always 0, for all metrics.

I am unable to access the values here:

github.com

apache/tvm/blob/main/src/runtime/profiling.cc#L130


  is_running_ = true;
  for (auto dev : devs_) {
    StartCall("Total", dev, {});
  }
}


void Profiler::StartCall(String name, Device dev,
                         std::unordered_map<std::string, ObjectRef> extra_metrics) {
  std::vector<std::pair<MetricCollector, ObjectRef>> objs;
  for (auto& collector : collectors_) {
    ObjectRef obj = collector->Start(dev);
    if (obj.defined()) {
      objs.emplace_back(collector, obj);
    }
  }
  in_flight_.push(CallFrame{dev, name, Timer::Start(dev), extra_metrics, objs});
}


void Profiler::StopCall(std::unordered_map<std::string, ObjectRef> extra_metrics) {
  CallFrame cf = in_flight_.top();
  cf.timer->Stop();

tkonolige · July 22, 2021, 9:43pm

An assumption the code makes is that the metrics are always increasing in value. For nvml, this is not true, so when you compute the different in the values from the start to the end, there is minimal or no change. You can verify this by printing out the raw values after PAPI_read and seeing that they are nonzero.

max1996 · July 23, 2021, 8:17am

Apparently NML has a maximum resolution of 1/6 of a second for most of its metrics. I assume, that most function kernels are too short to allow for an update of the metrics.

Is there a way to run each layer function multiple times during the profiling step and divide the measurements through the number of runs afterwards? I have seen such a functionality with run_individually in the debug executor, but not in the new profiler.

EDIT: ok, I hardcoded it to run each function for 5,000 times during the profiling, but I am confused about the names of the data in the profiler_output.calls array. The array is shorter than the debug_datum._nodes_list array stored in the debug executor, but the names of the individual nodes do not correspond to the op names that have been recorded by it. Is there a way to match these outputs?

tkonolige · July 26, 2021, 4:37pm

Right now there is no support to run each kernel multiple times. The main reason not to do this is that kernels can effect the performance of subsequent ones, so runtimes/performance will differ.

You’ll have to provide the _nodes_list and calls here for us to debug. calls contains the structural hash of each node called (under the “Hash” entry). This can be used to locate the corresponding node in the graph.

max1996 · July 27, 2021, 9:06am

ok, that is unfortunate as the NVML metrics have quite low polling rates (1/6 of a second), I just added a hardcoded loop that executes each layer a 1000 times while I measure the power consumption.

I do not know, what happened, but it is working now. Thank you very much for your help during this

Is there any interest in adding the RAPL and NVML support to TVM’s PAPI integration? I could prepare a pull request, but my current changes are not really ready for production use.

max1996 · August 6, 2021, 7:48am

Hi @tkonolige ,

thanks to your help, I was able to measure data on a range of different devices. But there might be a problem with the CMake script, if PAPI is not installed on a standard location:

I tried using this setup on a cluster environment, where PAPI was installed in my home directory and I put the path to the .pc file into TVM’s config.cmake. However, papi.h could not be found during compilation.

tkonolige · August 6, 2021, 5:01pm

@max1996 The path you provide to cmake should point to the folder containing the .pc file not point to the file itself. If you are already doing this, could you provide the contents of the pc file?

max1996 · August 9, 2021, 6:01am

yes, I pointed USE_PAPI to the folder, which contains the papi.pc

the pc itself looks like this:

prefix=/home/s0144002/papi_ml
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

Name: PAPI
Description: Performance API to access performance metrics on system
Version: 6.0.0.1
Libs: -L${libdir} -lpapi
Libs.private:
Cflags: -I${includedir}

the path to the pkg is: /home/s0144002/papi_ml/lib/pkgconfig if I understood the papi.pc file correctly, the include path should be correct.

the cmake step was processed without warnings or error, but during compilation papi.h cannot be found.

tkonolige · August 9, 2021, 4:14pm

I wasn’t linking to PAPI correctly. There is a fix up here: [FIX] Correctly link to PAPI by tkonolige · Pull Request #8691 · apache/tvm · GitHub

seven · August 16, 2021, 7:19am

Hi, I came across the same problem as yours. I install the PAPI library in the home directory, and the cmake result shows Using PAPI library pkgcfg_lib_PAPI_papi-NOTFOUND. When I make the project, it also hints that the papi.h file cannot be found. After adding the PAPI include directory, the make process could pass successful, but an error occurred when importing the tvm library.

Do you know how to address this problem? Thanks

max1996 · August 16, 2021, 7:38am

I did not encounter this problem

What does your config.cmake file look like? especially the PAPI line?

Does PAPI work, if you run the tests or the binaries like papi_native_avail?

did you compile PAPI with ./configure --prefix="<path to install DIR>" --with-components="cuda"?

have you tried running make clean & cmake .. before recompiling TVM?

My suspicion is that your PAPI installation might be broken… but I am not sure