[RFC] Composite Target

comaniac · August 27, 2020, 10:22pm

This is a follow-up RFC from [RFC] TVM Target Specification

This RFC is discussed with @junrushao.

In this RFC, we propose the way to represent a composite target. The motivation is that now we have multiple codegens contributed by the community members via BYOC targeting to specialized hardware and libraries, such as ARM ACL, ARM Ethos-N, TensorRT, and Vitis-AI. The main difference between those codegens and other TVM backends (e.g., LLVM, CUDA, OpenCL, etc) is that those specialized codegens may not execute an entire Relay graph, so we need to partition the graph and only offload the supported subgraphs to the device; while keeping the rest part on LLVM or CUDA.

Currently, we require users to manually run a list of Relay passes to partition graphs to let the compile engine dispatch each Relay function to the corresponding codegen. In general, we should encapsulate the required build pipeline, including graph partitioning, to the target semantic.

We come up with two proposals to represent a composite target, so that we can make corresponding changes in the compile engine to invoke the required passes.

P1: Add an attribute accelerators (or whatever name) to all TVM backend targets.

Example:

TVM_REGISTER_TARGET_KIND("llvm")
   .add_attr_option<Array<String>>("keys")
   .add_attr_option<Array<Target>>("accelerators") // Accelerator targets in order.
   .add_attr_option<Array<String>>("libs")
   .add_attr_option<String>("mcpu")
   .add_attr_option<Array<String>>("mattr")
   .add_attr_option<String>("mtriple")
   .add_attr_option<String>("mfloat-abi")
   .set_default_keys({"cpu"})
   .set_device_type(kDLCPU);

P2: Create a separate composite target.

Example:

TVM_REGISTER_TARGET_KIND("composite")
   .add_attr_option<Target>("target_host")
   .add_attr_option<Array<Target>>("accelerators") // Accelerator targets in order.

For both proposals, we will also do two things for each new device target:

Create a new accelerator target. Note that accelerator targets are not allowed to be used directly. It has to be in other target’s accelerator attribute.

TVM_REGISTER_ACCEL_TARGET_KIND("arm_acl")
  .add_attr_option<...>... // arm_acl specific attributes.

Create an alias (e.g., acl in the above example) so that users do not have to know anything about the target system. In this example, user can write target="acl" to replace a composite target (taking the second proposal as an example):

{
  kind: composite,
  target_host: { ... },
  accelerators: [{kind: arm_acl, ...}]
}

Comments and suggestions are welcome.

cc @zhiics, @anijain2305, @masahi, @tqchen, @kparzysz @ramana-arm, @mbaret, @jtuyls

junrushao · August 28, 2020, 12:58am

I would prefer P2, which provides clear separation of host (or fallback) target, as well as composability that keeps invariant when more targets are added

tqchen · August 28, 2020, 2:14am

I agree P2 is better. However, we need to be mindful that the composite can go beyond single accelerator settings. For example, we might also want to compose arm_cl and opencl on ARM GPU

zhiics · August 28, 2020, 4:16am

Glad to see this is proposed since we wanted to do it for a while. I also agree that P2 is better. Another use case of it is heterogeneous execution where we can have llvm and cuda targets in it.

comaniac · August 28, 2020, 4:37pm

Since accelerators is an array, we can specify [{kind: arm_acl}, {'kind": opencl'}] to represent this semantic. However, we need to integrate the CPU/GPU heterogeneous execution to BYOC to make it work. Currently, we can only support multiple BYOC backends (e.g., [{kind: arm_acl}, {'kind": ethosn'}]), since they share the same graph partitioning mechanism.

tqchen · August 28, 2020, 5:27pm

i see, i am just debating whether accelerators is the right name. perhaps devices?

comaniac · August 28, 2020, 5:38pm

Thanks for the suggestion. “devices” sounds good to me.

junrushao · August 28, 2020, 5:39pm

“devices” sounds much better than “accelerators”

jtuyls · August 31, 2020, 2:45pm

I also prefer P2 as it allows the representation of more complex targets in a simpler and more natural way. Great proposal and I think it’s going to be very useful for representing heterogeneous targets and accelerators!

comaniac · August 31, 2020, 6:03pm

Summary

Thanks everyone for valuable discussions. Here is a summary to conclude all the comments, and we will follow it to file PRs.

We will create a separate composite target macro as follows:

TVM_REGISTER_TARGET_KIND("composite")
   .add_attr_option<Target>("target_host")
   .add_attr_option<Array<Target>>("devices") // Accelerator targets in order.

Accordingly, here are the steps of creating a new composite target with a backend that can only execute a Relay subgraph:

Create a new target for the backend.

// Note that this macro indicates that the codegen/runtime of this target
// can only handle a part of Relay graph and must be used in composite targets.
TVM_REGISTER_ACCEL_TARGET_KIND("arm_acl")
  .add_attr_option<...>... // arm_acl specific attributes.

Create a target alias. Note that the name alias and the API are tentative.

'acl': {
  'kind': 'composite',
  'target_host': 'llvm',
  'devices': [{'kind': 'arm_acl'}]
}

Remaining Issues

Ideally, we expect to have a composite target like the following

'hetero-target': {
  'kind': 'composite',
  'target_host': 'llvm',
  'devices': [{'kind': 'ethosn'}, {'kind': 'opencl'}]
}

The above composite target indicates that we want to offload the Relay graph as many as we can to Ethos-N for the best performance. For the part that cannot be offloaded to Ethos-N, we want to put it to the OpenCL flow so that it can be executed on ARM GPU, for example.

The issue is that currently we have two partition mechanisms:

Manual annotated partition for heterogeneous execution (e.g., CPU/GPU).
BYOC partition (e.g., Ethos-N, ARM ACL, DNNL, Vitis-AI, CoreML, etc).

Since the two mechanisms work independently at this moment, we need to unify them to achieve the above goal. We will first work on the composite target in this RFC and leave an error message to the above case.