How to speed up the execution of get_output function? When I run the inference process multiple times, the run function is fast, but the get_output function becomes a bottleneck
run() doesn’t wait until model execution is finished. It simply launches the kernels on the GPU and returns immediately. That’s why it’s fast. get_output() waits until the kernels are finished to get the result.
Basically, the timer you have around get_output() is the correct way to get execution time for the model. Otherwise, you’re just measuring kernel launch time.