We start to introduce quite a few target-related concepts when we start to add more flexibilities into code generation, autotvm and tensorization. One particular example is the recent introduced “compiler” concept. While it is useful to add concepts as we grows the features, it is also important to group the concepts around a few key data structures — so that developers get first-class customization as a result of the infrastructure design.
In particular, this RFC proposes to revisit the target namespace and discusses how can be built customizations around it.
Target Data Structure
A target object represents a collection of information needed to customize the compilation for a specific device(or a collection of deviceswhen we deal with hetrogenous environment).
- R0: Each target should have a TargetKey (e.g. cuda, llvm, vulkan, dnnl)
- Target key can be used to index target specific behaviors(which code to call for codegen)
- R1: A primitive target need to have a string representation that users can point to(e.g. “cuda”)
- R2: Target need to have a list of attributes (e.g. llvm -mcpu=avx2)
- R3: We need a specific attribute about the hardware type, so that it can be used by AutoTVM for indexing.
- R4: For most of the device target, we also need a target_host to represent how can be compile the host driving part of the program(that calculates the device launching parameters)
- R5: We will need to provide a list of targets and a target_host for hetrogenous compilation(which could bring the possibility of a CompositeTarget(pending name)).
Hardware is an unique string that identifies the target device. A hardware string can imply a list of target and target hosts. It is important to keep a simple concise string format for hardware, so that our users can directly select from a built-in list when possible. We can also use the built-in names for benchmarking purpoes.
Some example hardware strings:
- rasp4b: implies llvm -mcpu=cortex-a74 -hardware=rasp4b
- rk3399/gpu: use the gpu on rk3399 board
- rk3399/bigcpu: use the big cores
There are many ways to name a hardware, and some of them are hierachical. For example, two phones could have different names, but corresponds to the same SoC. Our current approach is to canonicalize the names to an agreed upon name(e.g. the SoC name), and use that as a key to autotvm.
Importantly, a hardware string is not a target key itself, it can imply a composite collection of targets that are needed to perform the compilation. One way to do so is to allow the target creation to take in
[hardware-str] [additional-attributes], and we manually maintain the default configuration in a file.
Strawman proposal for hardware
- S0: Introduce target/hardware.py that maintains the mapping(hardware→target) and hierachy(e.g. rasp4b→soc-name→arm-board)
- S1: rename -model to -hardware in the target string.
In order to consolidate all the target aware customization into the target, we will need to introduces target specific attributes. Here are a list of possible attributes that a target:
- A0: Intrinsic lowering rules for ops
- A1: Ability annotation pass(for relay annotation to suggest supported features)
- A2: Rewriting passes for the specific target(relay or TIR level)
- A3: runtime::Module generation function for relay or TIR(bring your own codegen)
- A4: Memory hierachy information (alignments for special registers in accelerators)
For example, to implement the bring your own codegen DNNL example, we will need to introduce a dnnl target, and register A1, A2, A3.
There are a few ways to achieve the target attribute registration.
- B0: register via a specific PackedFunc callback
- B1: register via a columar attribute table(as in Op)
- B2: register via a row-wise table
Both B1 and B2 requires us to introduce a target registry as the op registry(there are quite some code that can be re-purposed).
B2 will require us to have a TargetInfo data structure that centralizes all the possible target attributes in typed form. B1 is more flexible in terms of growing the list of attributes, just like the op_attrs_type.h file. Note that we will likely need to extend the target attributes as we add new specialized hardware targets.
Option B0 is slower to lookup, but is still very useful when we try to dispatch against a Op and Target combination (e.g. lowering an intrinsic rewriting rule for exp under cuda target).
Please share your thoughts. In particular, it would be helpful to discuss:
- What would be a good string format for target, is the current format good enough
- Do we need to introduce a CompositeTarget for hetrogenous cases
- Hardware choices wrt to S0, S1
- Whether to introduce target attributes, B0 vs B1 vs B3