Hi, I’m using tvm on NanoPC-T4 RK3399, with Mali gpu T860, and i’ve got thist benchmarks results: For float32, batch_size=1
Network Name Mean Inference Time (std dev)
resnet-18 791.45 ms (4.40 ms) resnet-34 2288.98 ms (2.86 ms) resnet-50 1552.84 ms (6.31 ms) densenet-121 680.17 ms (15.91 ms) inception_v3 884.71 ms (41.24 ms) mobilenet 75.31 ms (0.24 ms) mobilenet_v2 70.10 ms (0.77 ms) squeezenet_v1.0 153.94 ms (6.68 ms) squeezenet_v1.1 130.76 ms (1.40 ms) vgg-16 5671.09 ms (16.66 ms)
For float16, batch_size=1
Network Name Mean Inference Time (std dev)
resnet-18 477.12 ms (1.40 ms) resnet-34 1393.27 ms (3.28 ms) resnet-50 930.57 ms (2.56 ms) densenet-121 428.37 ms (1.35 ms) inception_v3 893.79 ms (7.78 ms) mobilenet 59.01 ms (1.00 ms) mobilenet_v2 55.64 ms (4.47 ms) squeezenet_v1.0 228.77 ms (0.57 ms) squeezenet_v1.1 83.81 ms (0.98 ms)
I tried to use autotvm(tutorials/autotvm/tune_relay_mobile_gpu.py), but after doing 47 tasks(1000 n_trials) with xgd optimizer, i’d got 1,7 s as Mean inference time, but using ga with n_trials=1200 mean inference time was about 2,8 s. And I found interesting thing, on the first steps best score was ~36 GFLOPS, but for the end it always drops down to ~10 GFLOPS. Maybe I’m doing something wrong?