Thanks for the discussions. To begin with, I am not that attached to the particular choice of name. We can for example, decide to introduce another target kind (“hetero-target”, “myawesome-target”, “platform”, “CompilationOption”) whose attr fields matches exactly like CompilationOption.
I think our discussion boils down to the following quenstion
What can be called a “Target” in tvm
Intuitively, to many users, target refers to the “target platform” or environment that they want to run the program on. In a typical clang target triple, the following elements can be part of a target:
- ISA (x86, arm, riscv)
- runtime library (musl, libc)
- operation system env (windows, linux)
- vendor
Of course in most of the settings here target refers to a single device, usually with a single codegen path. These are targets at the leaf level.
However, as we start to build compilers for ML. The “target” in users’ mind is different.
For example, I want to run my program as fast as possible on aws/c4.4xlarge
, or nvidia/jetson-nano
.
Some of these “targets” already involves multiple codegen path(host code and device code).
When we start to involve graph or vm for the high level program driver, the vm/graph/aot choice
is another codegen path on the driving path of the program.
As the field evolves, the concept of “target” can change further. Right now we are talking about a single SoC with multiple devices. What if we develop an interest in deploying onto the following distributed environment.
- machine0:
- host: x86
- vdevice0: cuda
- machine1:
- host: arm
- vdevice0: vulkan
We might also interested in the following byoc customization where we offload part of
the computation to byoc-myawesome-cuda
strategy, which needs a self-contained
specification of host and library targets that makes use of cuda-graph runtime.
We want to embed it in a vm runtime that invoke the byoc-myawesome-cuda as an opaque function.
- host: x86
- vdevice0: byoc-myawesome-cuda
- host: x86
- runtime: cuda-graph
- vdevice0: cuda
- library: tensor-rt
- vdevice1: cuda
- runtime: vm
Can we call the above descriptions as “target”? From a UX’s perspective they certainly can be called target. Since from user’s perspective it is a specification of “target environtment”. In the context of machine learning they certainly can usually go beyond a single codegen path.
Another thing to note here is that some of these examples requires a level of
compositionality that goes beyond two-level(target then compilation-option). In the context of multi-machine setting,
the setting per machine roughly maps to CompilationOption being used here. Similarly,
in the case of byoc-myawesome-cuda
, vdevice0 itself would benefit from its
own runtime specification.
Another concept(another target kind) is needed to introduce another concept in order to
support the top-level composition.
UX Benefit of a Target – Tagging
Besides the benefit of the compositionality, one major UX benefit of target is the ability to tag.
It can be really complicated to manually specify a compositional compilatin option.
In most cases, we want users to directly leverage pre-built tags. For example, build for
nvidia/jetson-nano:cuda, build for aws/c4.4xlarge
, build for arm/soc-name:aot
(that directly implies unpacked_api).
These tags create short hands for us to setup the compositional configurations.
The ability to let build function takes in tags that quickly maps to both codegen, runtime, and library configurations would greatly improve overall user experiences. Making CompilationOption (or whatever we decided to call it) a Target would allow us to reuse this feature effectively and recursively.
Discussions
The main discussion point here is what is the scope of target. As we can see that:
- A0: On one hand, we can say that the configuration strictly follows a two-level structure. Where target is on the leaf, specifies a single codegen path. While we use a separate name for the top-level compositions.
- A1: On the other hand, we can see the need of:
- More than two levels of compositions
- The UX need to reuse tagging mechanism and simplify users’ inputs to the compiler.
From a two-level compositional view. Personally I think reusing Target for CompilationOption is not strictly more complicated, modulo the right kind naming. While the needs in ML can certainly go beyond that. Which makes me think going for target compositionality is not a bad idea.