Hello! I am currently testing the performance of my custom module.
i use below code to test performance.
ev = conv2d.time_evaluator( conv2d.entry_name , tvm.cpu() , number=1,repeat=100 )
prof_res = np.array(ev(c_data,c_p1,c_p2,c_outD).results) * 1000
print("Mean inference time (std dev): %.2f ms (%.2f ms)" % (np.mean(prof_res), np.std(prof_res)))
when i running it, it takes 1.06 ms but when i use below code to check the process time.
import time
start = time.process_time()
for i in range(0,100):
ev(c_data,c_p1,c_p2,c_outD)
end = time.process_time()
TIME = (end-start)/100
print("inference time = ", TIME*1000 , " ms\n")
it takes 11.65 ms.
I can’t understand these performance differences. I wonder why there is such a performance difference and should the performance measurement be like the first code?
The reason I asked this question is I don’t know how to measure two module when I’m running two devices simultaneously, one ending faster and the other ending too late. A device that ends sooner must wait for a device that ends later. So what I want is the time from the first start to the end of the late device. if i use first code that use time_evaluator
can’t measure end time. But the second code seems possible to me.
How do i measure performance when two simultaneous devices are running?