@Mousius thank you for raising this RFC and thanks for great discussions everyone.
For the most part I support the originally-proposed RFC.
I fully support A1a here. While it is tempting to try to define Target as a structure which models an arbitrary runtime environment, in practice, the range of runtime environments supported by TVM will change as TVM’s tuning capabilities grow. Additionally, Target currently plays a foundational part in the present AutoTVM design: it describes all of the compiler configuration which could affect a given autotuning measurement, and is therefore used as a key to describe the workload in autotuning logs.
Further, at present, there are things inside Target which do not impact autotuning:
--link-params
--executor
--runtime
Because of this, right now users can get into the undesirable experience of tuning a schedule without one of these parameters, then compiling for deployment with the parameters included, and seeing untuned implementations. Now, I bear some of the blame for this because I started this pattern in Target. However, it’s something we need to get rid of now that we have more tunable schedules landing in microTVM.
The fix for this is to remove these parameters from whatever we use to key the tuning logs. Currently, that’s Target
.
So in my book, that’s also the definition of Target
right now:
- the set of options which could influence autotuning on one tvm::runtime::Device.
While I do support the effort to gradually improve TVM’s ability to model an arbitrary heterogeneous system (e.g. even those with multiple executors spread across a set of independent machines), modeling this inside Target means that we need to simultaneously confront two questions whenever we want to broaden Target with additional configuraiton:
- does this configuration affect autotuning?
- who is consuming this configuration?
Adopting A1a allows us to just answer the second question up front by grouping compiler configuration into data structures according to the compiler component which consumes them. Broadly, we have these areas which may need to consume compiler config:
- Op-level code-generators (currently, this is the lowest common denominator describing what the Target options cover)
- Graph-level code-generators (e.g. AOT, Graph, VM)
- AutoTVM (e.g. parameters which may control scheduling)
- AutoScheduler (e.g. parameters which may affect TensorIR lowering)
- flow-level parameters (e.g. parameters which may be in PassConfig but which should potentially be captured into tuning logs such as
tir.disable_vectorize
)
Organizationally, my position is that it’s better to keep parameters grouped alongside others which are consumed by the same logical component of the compiler. This recognizes that the questions of scoping autotuning and modeling an execution environment are larger than any one RFC and are questions which TVM as a community will continue to refine as new improvements such as AutoScheduler, AutoTIR, etc are introduced. Adopting a composite structure provides a framework to keep things organized as we incrementally improve the compiler rather than defining a single open-ended struct.
This approach then argues for the following:
- We adopt A1a, a composite top-level configuration structure which consists of pieces mapped to each compiler component
- We tighten the definition of Target to mean “configuration parameters for a single codegen which affect autotuning.”
- To accommodate the previous bullet, target_host is hoisted out of Target and becomes its own Target. See commentary in [RFC] Unified device/target/memory scope planning with regards to plans to add human-readable labels to Targets (e.g.
dsp-cpu
,low-power-cpu
). - Autotuning keys continue for the moment to be confined to the contents of the Targets.
My position on this discussion is that we should still keep the configuration pieces organized according to the consuming compiler sub-component and express any relations in a sibling top-level structure. Here is an example of that in a futuristic world where we support splitting a model across multiple top-level executors:
{
"targets": {
"dsp-cpu": {
"kind": "llvm",
"mcpu": "cortex-a72",
},
"gpu": {
"kind": "mali",
},
"low-power-cpu": {
"kind": "llvm",
"mcpu": "cortex-m0",
},
},
"executors": {
"dsp": {
"targets": ["dsp-cpu", "gpu"],
"target_host": ["dsp-cpu"],
"executor": "vm",
"runtime": "c++",
},
"low-power": {
"targets": ["low-power-cpu"],
"target_host": ["low-power-cpu"],
"executor": "aot",
"runtime": "c",
"flow-config": {
"link-params": true,
"enable-byoc": ["cmsis-nn"],
},
},
},
}
This is quite a forward-looking example. In practice, the effects of adopting A1a look to me at present like:
- Defining
target_host
as merely one of the sub-Targets
included in CompilationConfig - Splitting out the executor, runtime, and link-params keys from Target
- Avoiding introducing any recursion, which means I think that we should not adopt that aspect of the Composite Target RFC.