@giuseros, Thank you for the detailed explanation. It got clear on the usage of the flags USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME and USE_ARM_COMPUTE_LIB while compiling TVM
One thing still unclear is, how to use ARM compute library for android deployment?
Please share me some example or document on the same?
I could see the flag USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME is controlled only in the cmake, how can we use it for cross compilation or for Android RPC compilation?
How these two use cases in TVM can use ARM compute library.
And instead of USE_OPENCL you should set USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME to the Arm Compute Library path. Hopefully, this will statically link against the library, so that your RPC server on the phone will have all you need.
The idea is to follow bundle_static and statically link everything together (runtime+library), so that you can copy a single binary to your phone and execute the network.
This option is easier, but it won’t let you auto-tune the network (for that you need to send different workloads to the board)
@giuseros @lhutton1
I got one issue with ACL runtime. It not supporting Depthwise convolution right, mentioned ACL NEON convolution only support group=1.
But it seems ACN NEON have Depthwise conv source code, does its just matter of adding the support or there is some known issue in supporting this?
You’re correct, the integration doesn’t currently support depthwise convolution.
We did run into some complexity in that when we convert the layout of the weights with the convert layout pass it needs to be different to what we use for a normal convolution. However, the convert layout pass only allows us to define a layout for nn.conv2d for the weights which covers both normal and depthwise convolutions. I did have an implementation mostly working for this, although didn’t get chance to post it before I left Arm to return to university. @giuseros and @dmitriy-arm may be able to comment further.