Estimated total latency has big gap with actual inference time when using Auto schedular

huangzhiyuan · September 27, 2022, 8:45am

I tried tuning a model using auto schedule for 10+ hours in ARM CPU. However, found that there is a big gap between the whole network evaluated by the tune Estimated total latency (7.501 ms) and the actual running latency (12.75 ms), as shown below.

|  ID  | Latency (ms) | Speed (GFLOPS) | Trials |
-------------------------------------------------
|    0 |        0.228 |          36.75 |   7168 |
|    1 |        0.001 |           5.64 |     64 |
|    2 |        0.047 |           0.09 |   3136 |
|    3 |        0.057 |          37.04 |   3840 |
|    4 |        0.005 |          24.47 |    192 |
|    5 |        0.114 |          36.85 |   1216 |
|    6 |        0.001 |           4.69 |     64 |
|    7 |        0.001 |           3.47 |     64 |
|    8 |        0.057 |          36.79 |    640 |
|    9 |        0.286 |          27.54 |   3008 |
|   10 |        0.019 |          34.25 |    256 |
|   11 |        0.114 |          36.77 |    320 |
-------------------------------------------------
Estimated total latency: 7.501 ms       Trials: 19968   Used time : 25604 s     Next ID: 2
Evaluate inference time cost...
Mean inference time (std dev): 12.75 ms (0.03 ms)

the evaluate time I got:

print("Evaluate inference time cost...")
ftimer = module.module.time_evaluator("run", ctx, repeat=10, min_repeat_ms=500)
prof_res = np.array(ftimer().results) * 1000  # convert to millisecond
global result
result = "Mean inference time (std dev): %.2f ms (%.2f ms) " % (
np.mean(prof_res), np.std(prof_res))
print(result)

Is such a big gap acceptable?