What is the difference between ApplyGraphBest and ApplyHistoryBest

I ran the tune_relay_x86.py demo and the output is as following. The time-cost is almost the same between kernel level tune and graph level tune. So I read the ApplyHistoryBest class and ApplyGraphBest class defination but got few info. Could anyone can help expain this? Thanks in advance.

Evaluation of the network compiled in 'default' mode without auto tune:
Compile...
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  161.7545     161.5251     162.3927     161.3457      0.4572   
               

Evaluation of the network been tuned on kernel level:
Compile...
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  126.3688     126.2374     126.7189     126.1499      0.2502   
               

Evaluation of the network been tuned on graph level:
Compile...
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  126.5370     126.5907     126.5914     126.4290      0.0764