Overhead of Graphexecutor

I tuned an batchmatmul with ansor from relay. A strange res is the tune log told the op lantency was 40us, but run from GraphExecutor was 700+us. Anyone has the idea ?

image

Please check whether the target is your device and check whether you have used to tuned log to generate lib. I guess these may cause some error.