I’m new to TVM and try to use TVM-DarkNet to detect videos.
With the help of tutorials, I successfully get tuned yolov3-tiny models. module.module.time_evaluator
inference time is 1.26 ms on P40 and 0.81 ms on V100.
the codes is here https://github.com/irvingzhang0512/tvm_tests/blob/master/darknet_tune.py.
Then I try to detect videos with the tuned models and cal fps.
Codes can found here https://github.com/irvingzhang0512/tvm_tests/blob/master/darknet_evaluate.py
The above codes are tested on two server.
- Server 1(P40): 50fps, inference time is about 20 ms (image preprocessing 6-7ms + tvm m.set_input 2-3ms + tvm m.run/m.get_output 2.5ms + tvm api nms 0.2ms + cv2 read video frame 8-9ms)
- Server 2(V100): 22fps, inference time is about 43 ms (image preprocessing 32-33ms + tvm m.set_input 2-3ms + tvm m.run/m.get_output 1-2ms + tvm api nms 0.2ms + cv2 read video frame 5-6ms)
Obviously, image preprocessing(cv2.resize, np ops) & cv2 read video frame(cv2.read()) cost too much time. So I try to evaluate cv2 only. (codes can be found here https://github.com/irvingzhang0512/tvm_tests/blob/master/cv2_only_evaluate.py)
- Server 1, image preprocessing costs about 3ms, read video frame cost about 2ms.
- Server 2, image preprocessing costs about 3ms, read video frame cost about 2-3ms.
Finally I find that, if relay.build_module.build is commented, image preprocessing and read video frame cost less time.(Codes can be found here https://github.com/irvingzhang0512/tvm_tests/blob/master/darknet_comment_test.py)
- Server 1, image preprocessing 6-7ms/ read video frame 6-7ms (before commented) to image preprocessing 3-4 ms / read video frame 2 ms (after commented)
- Server 2, image preprocessing 32ms/ read video frame 5-6ms (before commented) to image preprocessing 2-3ms/ read video frame 2-3ms (after commented)
My questions are:
- Why cv2.resize & cv2.read cost much more time after relay.build_module? What should I do?
- TVM m.set_input cost about 2ms, bigger than
module.module.time_evaluator
inference time, is it normal? - Is there any better solutions to detect videos with TVM Python API?(except run
m.set_input
for every frame)