C++ Inference is Twice as Slow as Python in TVM 0.6.1

yuyq · October 11, 2023, 10:42am

In TVM 0.6.1, there is a significant difference in inference time between C++ and Python. Specifically, the C++ inference is twice as lower as Python.

for detail: In terms of time, only the reasoning time is calculated, and the pre and post processing is not considered. In particular, target = “llvm -mcpu=core-avx-i” is specified in python and then exported so, inferred in c++; while in c++ the device only has kDLCPU, I don’t know if it is a device difference.

Has anyone encountered the same problem? help

thank you

Hzfengsy · October 11, 2023, 3:49pm

Version 0.6.1 is outdated and no longer actively maintained.

yuyq · October 12, 2023, 1:31am

My deploy env must be gcc 4.8.5，as far as I am concerned > 0.6.1 need gcc 5+. so I need TVM 0.6.1

As mentioned in the question, can you easily tell what is causing C++reasoning to be slower than Python?

Thank you

yuyq · October 12, 2023, 6:04am

@Hzfengsy, hi

Does TVM 0.6.1 support FP16? Can you provide an example

Hzfengsy · October 12, 2023, 7:07am

It’s not an expected result, it might be a bug.

Sorry, I have no idea about such an outdated version.