How to use Arm compute library with android RPC?

joyalbin · November 10, 2020, 11:14am

Dear All,

This is a continuation from the issue https://github.com/apache/incubator-tvm/issues/6857

@giuseros, Thank you for the detailed explanation. It got clear on the usage of the flags USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME and USE_ARM_COMPUTE_LIB while compiling TVM

One thing still unclear is, how to use ARM compute library for android deployment? Please share me some example or document on the same?

I could see the flag USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME is controlled only in the cmake, how can we use it for cross compilation or for Android RPC compilation?

How these two use cases in TVM can use ARM compute library.

Please someone help me in understanding…

Thank You

giuseros · November 10, 2020, 2:36pm

Hi @joyalbin,

Unfortunately I never tried this scenario, and I am not Android expert. I think you have two possibilities:

Running the RPC server on Android
Statically compiling every together, copy the binary on Android and run

RPC on android

In theory it should be similar to the way you use the RPC server on a linux board (like the Raspi).

You should follow those instructions:

And instead of USE_OPENCL you should set USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME to the Arm Compute Library path. Hopefully, this will statically link against the library, so that your RPC server on the phone will have all you need.

Statically linked binary

If that doesn’t work, another solution would be to deploy the model as a stand-alone. This is the guide: https://github.com/apache/incubator-tvm/tree/main/apps/bundle_deploy

The idea is to follow bundle_static and statically link everything together (runtime+library), so that you can copy a single binary to your phone and execute the network.

This option is easier, but it won’t let you auto-tune the network (for that you need to send different workloads to the board)

Hope this helps

joyalbin · November 11, 2020, 4:32am

@giuseros this was helpful… I got some idea to move forward.

Dear All, please share any performance comparison between TVM kernel and ACL kernel on any ARM devices (preferably android)?

joyalbin · December 4, 2020, 1:20pm

@giuseros
@lhutton1 I got one issue with ACL runtime. It not supporting Depthwise convolution right, mentioned ACL NEON convolution only support group=1.

github.com

apache/tvm/blob/main/src/runtime/contrib/arm_compute_lib/acl_runtime.cc#L225


 * \param mm The ACL conv2d layer can request auxiliary memory from TVM.
 */
void CreateConvolution2DLayer(CachedLayer* layer, const JSONGraphNode& node,
                              const std::shared_ptr<arm_compute::MemoryManagerOnDemand>& mm) {
  std::vector<std::string> padding = node.GetAttr<std::vector<std::string>>("padding");
  std::vector<std::string> strides = node.GetAttr<std::vector<std::string>>("strides");
  std::vector<std::string> dilation = node.GetAttr<std::vector<std::string>>("dilation");
  arm_compute::PadStrideInfo pad_stride_info = MakeACLPadStride(padding, strides);

  int groups = std::stoi(node.GetAttr<std::vector<std::string>>("groups")[0]);
  ICHECK(groups == 1) << "Arm Compute Library NEON convolution only supports group size of 1.";

  arm_compute::ActivationLayerInfo act_info;
  if (node.HasAttr("activation_type")) {
    std::string activation_type = node.GetAttr<std::vector<std::string>>("activation_type")[0];
    if (activation_type == "relu") {
      act_info = arm_compute::ActivationLayerInfo(
          arm_compute::ActivationLayerInfo::ActivationFunction::RELU);
    } else {
      LOG(FATAL) << "Unsupported activation function";
    }

But it seems ACN NEON have Depthwise conv source code, does its just matter of adding the support or there is some known issue in supporting this?

github.com

ARM-software/ComputeLibrary/blob/master/src/runtime/NEON/functions/NEDepthwiseConvolutionLayer.cpp

/*
 * Copyright (c) 2017-2020 Arm Limited.
 *
 * SPDX-License-Identifier: MIT
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to
 * deal in the Software without restriction, including without limitation the
 * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
 * sell copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

This file has been truncated. show original

Thanks Albin

lhutton1 · December 4, 2020, 1:40pm

Hi @albin,

You’re correct, the integration doesn’t currently support depthwise convolution.

We did run into some complexity in that when we convert the layout of the weights with the convert layout pass it needs to be different to what we use for a normal convolution. However, the convert layout pass only allows us to define a layout for nn.conv2d for the weights which covers both normal and depthwise convolutions. I did have an implementation mostly working for this, although didn’t get chance to post it before I left Arm to return to university. @giuseros and @dmitriy-arm may be able to comment further.

lhutton1 · January 14, 2021, 10:48am

@albin just a quick update, depthwise support has been added https://github.com/apache/tvm/pull/7206