I ran the tune_relay_x86.py
demo and the output is as following. The time-cost is almost the same between kernel level tune and graph level tune. So I read the ApplyHistoryBest
class and ApplyGraphBest
class defination but got few info. Could anyone can help expain this? Thanks in advance.
Evaluation of the network compiled in 'default' mode without auto tune:
Compile...
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
161.7545 161.5251 162.3927 161.3457 0.4572
Evaluation of the network been tuned on kernel level:
Compile...
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
126.3688 126.2374 126.7189 126.1499 0.2502
Evaluation of the network been tuned on graph level:
Compile...
Evaluate inference time cost...
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
126.5370 126.5907 126.5914 126.4290 0.0764