Layer-wise profiling of NNVM graph

Hi,
Is there any way to perform layer-wise profiling of a neural network constructed using NNVM Symbols? I have used the module.time_evaluator() which gives the total runtime of the module.

Thanks.

For cuda, nvprof works well.

For x86, you can use VTune if you have an access to it. I have a good experience using vtune with NNVM.

Or you can modify here into something like this:

void Run() {
    // setup the array and requirements.
    for (size_t i = 0; i < op_execs_.size(); ++i) {
      auto node = nodes_[i];
      auto param = node.param;
      auto func_name = param.func_name;

      if (op_execs_[i]) {
        auto t0 = std::chrono::system_clock::now();
        op_execs_[i]();
        auto t1 = std::chrono::system_clock::now();
        auto elapsed_ms = std::chrono::duration_cast<std::chrono::milliseconds>(t1-t0).count();
        LOG(INFO) << "Execute " << func_name << " took " << elapsed_ms << " msec.";
      }
    }
  }
3 Likes

@masahi Thanks. I will try both !