C++ profile does not work

Hi I want to do op profiling on a TVM model with C++ API, and followed this discussion Profiling Report C++ - #15 by tkonolige

But it seems the profile function is not working properly. What’s wrong with my code? I didn’t find any doc either. Here goes my code segment:

  DLDevice dev{kDLCPU, 0};
  tvm::runtime::Module mod_factory = tvm::runtime::Module::LoadFromFile("model.so");

  // create the graph executor module
  tvm::runtime::Module gmod = mod_factory.GetFunction("default")(dev);
  tvm::runtime::PackedFunc set_input_f = gmod.GetFunction("set_input");
  tvm::runtime::PackedFunc get_output_f = gmod.GetFunction("get_output");
  tvm::runtime::PackedFunc get_graph_json_f = mod_factory.GetFunction("get_graph_json");
  tvm::runtime::PackedFunc run_f = gmod.GetFunction("run");

...... // Preparing for the input data

  std::string json = get_graph_json_f();

  int64_t device_type = kDLCPU;
  int64_t device_id = 0;
  tvm::runtime::Module executor = (*tvm::runtime::Registry::Get("tvm.graph_executor_debug.create"))(json, mod_factory, device_type, device_id);

  //Set up profiler
  tvm::runtime::PackedFunc debug_run_f = executor.GetFunction("run");  //Get inference function 'run'
  tvm::runtime::PackedFunc profile_f = executor.GetFunction("profile");
  tvm::runtime::PackedFunc debug_get_output_f = gmod.GetFunction("debug_get_output");
  tvm::Device cpu;
  cpu.device_type = kDLCPU;
  cpu.device_id = device_id;
  std::vector<tvm::Device> devices;
  devices.push_back(cpu);
  tvm::runtime::profiling::Profiler prof(devices, {});
  std::cout << "Start profiling\n";
  prof.Start();
  prof.StartCall("profile", cpu);

  auto start = getCurrentTime();

  // run the code
  for (int i=0; i<100; ++i) {
    run_f();
  }
  tvm::runtime::profiling::Report report = profile_f();  <<<<<<<<<<<<<<<<< This step failed
  prof.StopCall();
  prof.Stop();
  auto end = getCurrentTime();
  std::cout << (end - start)/1000/100 << "ms" << std::endl;

  std::cout << prof.Report()->AsTable() << std::endl;
  std::cout << report->AsTable() << std::endl;

Here is the error dumped out:

Start profiling
terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [20:35:39] /data/tvm-0.7/include/tvm/runtime/packed_func.h:1489: Function <anonymous> expects 1 arguments, but 0 were provided.
Stack trace:
  0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::runtime::profiling::Report (tvm::runtime::Array<tvm::runtime::profiling::MetricCollector, void>)>::AssignTypedLambda<tvm::runtime::GraphExecutorDebug::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::Array<tvm::runtime::profiling::MetricCollector, void>)#5}>(tvm::runtime::GraphExecutorDebug::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::Array<tvm::runtime::profiling::MetricCollector, void>)#5})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  1: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /usr/include/c++/8.3.0/bits/std_function.h:687
  2: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<>() const
        at /data/tvm-0.7/include/tvm/runtime/packed_func.h:1369
  3: DeployGraphExecutor()
        at /data/tvm-0.7/tools/tvm_cross_runner.cc:138
  4: main
        at /data/tvm-0.7/tools/tvm_cross_runner.cc:161
  5: __libc_start_main
  6: 0x00000000004058e8
  7: 0xffffffffffffffff

Thanks.

You need to pass a list of metric collectors to the profile function. In this case you probably don’t want any so you should pass an empty array:

tvm::runtime::profiling::Report report = profile_f({});

Hi I added profile_f({}), but it still failed.

tvm_cross_runner.cc: 在函数‘void DeployGraphExecutor()’中:
tvm_cross_runner.cc:138:56: 错误:对‘(tvm::runtime::PackedFunc) (<brace-enclosed initializer list>)’的调用没有匹配
   tvm::runtime::profiling::Report report = profile_f({});
                                                        ^
In file included from /data/tvm-0.7/include/tvm/runtime/module.h:252,
                 from tvm_cross_runner.cc:2:
/data/tvm-0.7/include/tvm/runtime/packed_func.h:1362:20: 附注:candidate: ‘tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()(Args&& ...) const [with Args = {}]’
 inline TVMRetValue PackedFunc::operator()(Args&&... args) const {
                    ^~~~~~~~~~
/data/tvm-0.7/include/tvm/runtime/packed_func.h:1362:20: 附注: 备选需要 0 实参,但提供了 1 个
make: *** [lib/tvm_cross_runner] 错误 1

(it says candidate needs 0 argument, but provided 1)

BTW, I’m using TVM v0.8dev

In graph_executor_debug.cc, it seems not need to specify a parmeter {} ?

PackedFunc GraphExecutorDebug::GetFunction()
...
  } else if (name == "profile") {
    return TypedPackedFunc<profiling::Report(Array<profiling::MetricCollector>)>(
        [sptr_to_self, this](Array<profiling::MetricCollector> collectors) {
          // We cannot send Arrays over rpc, so in order to support profiling
          // on remotes, we accept a nullptr for collectors.
          if (collectors.defined()) {
            return this->Profile(collectors);
          } else {
            return this->Profile({});
          }
        });
  } 

Thank you for the direction. Fixed with the following:

  tvm::runtime::profiling::Report report = profile_f(Array<profiling::MetricCollector>());

Name        Duration (us)  Percent  Device  Count  Argument Shapes  
profile      1,518,017.56   100.00    cpu0      1                   
----------                                                          
Sum          1,518,017.56   100.00              1                   
Total        1,518,028.12             cpu0      1                   

Name                                                                                                                  Duration (us)  Percent  Device  Count                                                
                                              Argument Shapes              Hash  
tvmgen_default_fused_fast_pow_multiply_add_multiply_fast_tanh_add_multiply_multiply                                        3,726.14    31.32    cpu0      4                                                
                       float32[100, 3072], float32[100, 3072]  36bd5815a7031dde  
tvmgen_default_fused_nn_dense_add_2                                                                                        1,289.35    10.84    cpu0     12                                     float32[100
, 768], float32[768, 768], float32[1, 768], float32[100, 768]  78e6feb165d2eeb6  
tvmgen_default_fused_nn_dense_add_add                                                                                      1,268.92    10.67    cpu0      4                float32[100, 3072], float32[768,
 3072], float32[1, 768], float32[100, 768], float32[100, 768]  2e1aa1e64f7a032f  
tvmgen_default_fused_nn_dense_add_1                                                                                        1,112.11     9.35    cpu0      4                                  float32[100, 7
68], float32[3072, 768], float32[1, 3072], float32[100, 3072]  527edc24624df0da  
tvmgen_default_fused_nn_softmax_1                                                                                            769.91     6.47    cpu0      4                                                
           float32[1, 12, 100, 100], float32[1, 12, 100, 100]  ae2f0cda0c50292a  
tvmgen_default_fused_nn_batch_matmul                                                                                         745.56     6.27    cpu0      4                                            floa
t32[12, 100, 100], float32[12, 64, 100], float32[12, 100, 64]  598e9aa60a516450  
tvmgen_default_fused_nn_dense_add_add_1                                                                                      465.45     3.91    cpu0      4                  float32[100, 768], float32[768
, 768], float32[1, 768], float32[100, 768], float32[100, 768]  f7f51a285435a64b  
...

1 Like