In TVM 0.6.1, there is a significant difference in inference time between C++ and Python. Specifically, the C++ inference is twice as lower as Python.
for detail: In terms of time, only the reasoning time is calculated, and the pre and post processing is not considered. In particular, target = “llvm -mcpu=core-avx-i” is specified in python and then exported so, inferred in c++; while in c++ the device only has kDLCPU, I don’t know if it is a device difference.
Has anyone encountered the same problem? help
thank you