SSD Mobilenet performance Issue

james · April 22, 2019, 10:12am

Hi,

I just converted the mobilenet model 608x608 to TVM It is taking 0.2 seconds per frame and giving me 5 FPS on 1050 TI with full cuda cores usage. Time per frame: 0.20082473754882812 0.19168806076049805 0.19301962852478027 0.19402718544006348 0.1933760643005371 0.1952970027923584 0.2037806510925293 0.1919691562652588 0.20432472229003906 0.20802545547485352 0.20062041282653809

nvidia-smi -i 0 --query-gpu=index,timestamp,utilization.gpu,power.draw,temperature.gpu --format=csv -l 1 0, 2019/04/22 15:39:58.050, 90 %, [Not Supported], 56 0, 2019/04/22 15:39:59.050, 88 %, [Not Supported], 56 0, 2019/04/22 15:40:00.051, 90 %, [Not Supported], 56 0, 2019/04/22 15:40:01.051, 88 %, [Not Supported], 56 0, 2019/04/22 15:40:02.051, 88 %, [Not Supported], 57 0, 2019/04/22 15:40:03.052, 96 %, [Not Supported], 57 0, 2019/04/22 15:40:04.052, 100 %, [Not Supported], 57 0, 2019/04/22 15:40:05.053, 91 %, [Not Supported], 57 0, 2019/04/22 15:40:06.053, 89 %, [Not Supported], 58 0, 2019/04/22 15:40:07.053, 88 %, [Not Supported], 58

Am i doing something wrong or is this current performance benchmark ?

Code:

def display(img, out, thresh=0.5): pens = dict() for det in out: cid = int(det[0]) if cid < 0: continue score = det[1] if score < thresh: continue scales = [img.shape[1], img.shape[0]] * 2 xmin, ymin, xmax, ymax = [int(p * s) for p, s in zip(det[2:6].tolist(), scales)]
    cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (255,0,0), 2)
    cv2.putText(img, class_names[cid],(xmin,ymin), font, 1, (200,0,0), 3, cv2.LINE_AA)

cv2.imshow('frame', img)
cap = cv2.VideoCapture(“rtsp://admin:admin123@192.168.1.193:554/Streaming/Channels/101”) while(cap.isOpened()): t0 = time.time() ret, image = cap.read() #image = cv2.imread(frame) img_data = cv2.resize(image, (data_shape[2], data_shape[3])) img_data = img_data[:, :, (2, 1, 0)].astype(np.float32) #img_data -= np.array([123, 117, 104]) img_data = np.transpose(np.array(img_data), (2, 0, 1)) img_data = np.expand_dims(img_data, axis=0)
module.run(data=img_data)
tvm_output = module.get_output(0)
#print(tvm_output)
display(image, tvm_output.asnumpy()[0], thresh=0.25)

if cv2.waitKey(1) & 0xFF == ord('q'):
    break

print(time.time()-t0)
cap.release() cv2.destroyAllWindows()

kevinthesun · April 22, 2019, 8:07pm

Have you tried to autotune?

zzw · April 28, 2019, 3:55am

I use the cv2.rectangle and cv2.putText for display, but it’s speed is slower than the gluoncv func utils.viz.plot_bbox.

Do you have this problem? or could you share your complete codes of display?

james · May 7, 2019, 3:36am

I am sorry i am a beginner and after trying many times i have failed in understanding auto tune.

I have exported the files for mobilenet ssd but without auto tune.

mobilenet_ssd_608_tvm.so
mobilenet_ssd_608_tvm.json
mobilenet_ssd_608_tvm.params

How do i pass the trained model as input to auto tune ?
If possible can you share working python code ?

james · May 21, 2019, 6:18pm

@kevinthesun

I am stuck on autotune and it keeps on running without saving any models…
Please reply.

eqy · May 21, 2019, 9:58pm

What is the output during the autotuning process?

kevinthesun · May 21, 2019, 10:47pm

Autotune should be done before compilation and exporting. Did you have any issue following this tutorial: https://docs.tvm.ai/tutorials/autotvm/tune_relay_cuda.html#sphx-glr-tutorials-autotvm-tune-relay-cuda-py