As a intial bringup, I managed to depoly tvm runtime and module compiled by nnvm compiler to ARM device.
And on ARM device, I had a inference test - test app is written in c++ code - for the inference test to Resnet18_v1 and squeezenet1.1
ARM(Exynos5433) device info:
CPU : 1.9GHz Quad-Core (Cortex®-A57) + 1.3GHz Quad-Core (Cortex®-A53)
GPU : Mali™-T760 MP6
The measured performance on the ARM device is as following,
For mxnet based resnet18_v1 inference performance
CPU : about 280ms, GPU : about 36ms
For mxnet based squeezenet1.1 inference performance
CPU : about 133ms, GPU : about 4.7ms
For the exact measurement, I used performance mode as cpu governer.
As for the GPU performance, the result is surprising to me. Even the output result says correct label.
By the way, have you tested inference on real device and c++ code not RPC way and python code?
As I mentioned above, the result says correct label and it shown response time too faster than CPU. Anyway, I will test other models also to make sure.