Execute-efficiency about module.run()

When using TVM for inference tasks on GPU, I noticed that there was a “minimum call interval” while using the function “module.run()”, which is related to model structure(seems like “cooldown time”). Is this phenomenon due to the internal design of TVM, or just caused by mechanism of std::function in C++?

My environment:

  • OS: ubuntu 18.04 x64
  • GPU: GTX 2080Ti
  • model: the build-in ‘resnet-18’ with input shape of (300,3,300,300) /(B,C,W,H)

code:

# …… load module
for i in range(100):
    module.set_input("data", data)
    module.run()  
    time.sleep(0.04) # time sleep 40ms

In the above scenario, actual execution time of “module.run()” is about 0.7ms, but the “cooling time” (if exist) is about 90ms.

if you try to masure the runtime of “module.run()” with the above code, you will get about 50ms. if you change the sleep time to tx ms, the measure result will be about "0.7+ max(tx, 90)-tx (ms) "

I may have get the answer by myself and it is not an authoritative explanation, only a personal guess:

module.run() won’t block the execution of python. In fact, it is still calculating when it comes to time.sleep(0.04). * But when the program executes to module.set_input("data", data) again, the program will wait until the last module.run() completes its computation