start = time.time()
m.run()
end = time.time()
i measure the time run, but the latencies are same. is module starting run when set_input?
1 batch cost 0.002 seconds, 16 batch 0.00213 seconds, 64 batch 0.00245 seconds…? it seems so weird.
the statistics are measured in T4.