Hello everyone,
I am currently trying to benchmark some results and comparing the results when inferencing with C++ or Python. On Python I used the GraphExecutor benchmark function. However the function doesn’t exist in the C++ API. So I timed the run() function by recording the time before and after it. However the times are around 1.5x - 2x slower than the results given by the Graph Executor’s benchmark function of Python. I then tried in Python to do the same thing in C++, i.e. time around the run() function and got approximatively the same results as C++. So my question is, is it a correct way to measure inference? Why does the benchmark function gives faster times?
Thank you!
EDIT: I know that Python and C++ run using the same TVM runtime but I still wanted to compare EDIT2: It’s running on CPU so synchronization isn’t required right?
In C++:
tvm::runtime::Module module = mod_factory.GetFunction("default")(dev);
tvm::runtime::PackedFunc set_input = module.GetFunction("set_input");
tvm::runtime::PackedFunc get_output = module.GetFunction("get_output");
tvm::runtime::PackedFunc run = module.GetFunction("run");
(Set input etc...)
(Measure time here);
run();
(Measure time here);
In Python:
module = graph_executor.GraphModule(lib["default"](dev))
(Measure time here)
module.run()
(Measure time here)