Correctly measure inference in C++

Sizzling2169 · April 25, 2022, 9:48am

Hello everyone,

I am currently trying to benchmark some results and comparing the results when inferencing with C++ or Python. On Python I used the GraphExecutor benchmark function. However the function doesn’t exist in the C++ API. So I timed the run() function by recording the time before and after it. However the times are around 1.5x - 2x slower than the results given by the Graph Executor’s benchmark function of Python. I then tried in Python to do the same thing in C++, i.e. time around the run() function and got approximatively the same results as C++. So my question is, is it a correct way to measure inference? Why does the benchmark function gives faster times?

Thank you!

EDIT: I know that Python and C++ run using the same TVM runtime but I still wanted to compare EDIT2: It’s running on CPU so synchronization isn’t required right?

In C++:

    tvm::runtime::Module module = mod_factory.GetFunction("default")(dev);
    tvm::runtime::PackedFunc set_input = module.GetFunction("set_input");
    tvm::runtime::PackedFunc get_output = module.GetFunction("get_output");
    tvm::runtime::PackedFunc run = module.GetFunction("run");
    (Set input etc...)
    (Measure time here);
    run();
    (Measure time here);

In Python:

module = graph_executor.GraphModule(lib["default"](dev))
(Measure time here)
module.run()
(Measure time here)

manojec054 · April 27, 2022, 3:17am

It seems that module.run is non blocking call. So your time measurement should be after module.get_output(0) call.

More Info : Tvm Inference peformance almost 10x better than pytorch - #2 by vinx13