I’ve been working on optimizing an android model and have found that the inference time reported by an RPC server is consistently 2X faster (3 seconds vs 1.5 seconds for example) than when I run the model directly on the phone using either apps/android_deploy or native c++ in the style of apss/howto_deploy/cpp_deploy.cc. Does anyone have any idea why the RPC server is so much faster than other methods of running the model?
Just posting the solution here in case anyone else runs into a similar issue.
In general, squeezing the most performance possible in Android applications is tricky, as the OS understandably tries to use any means possible to reduce power consumption. In this case, the RPC app tries to run at full tilt by ensuring that the process responsible for kernel execution is the one responsible for the current on-screen Activity
. This is why you see the RPC app launch a new Activity
periodically: to contain the current kernel execution.
We have observed that running kernels in off-screen or background processes on android apps significantly reduces performance, so if you want to have to the lowest inference time possible, the best method we know of is to ensure that the kernel execution is in the current Activity’s process.