Thank you @Mousius for the RFC! It’s great to read about potential user experience issues of the current Target system, and happy to discuss about potential ways to improve it.
Proposeds API in the RFC
CompilationConfig
, as proposed in this RFC, aims to improve UX by wrapping a list of targets, runtime and execution information in an extra layer of abstraction.
The core API is demonstrated in the RFC as:
config = CompilationConfig(
target_host=target,
targets=[target],
executor=Executor("aot", {"interface-api": "c", "unpacked-api": True}),
runtime=Runtime("crt", {"system-lib": True})
)
To improve the developer experience, a few other APIs are proposed along with the data structure:
CompilationConfigNode::GetExecutor();
CompilationConfigNode::ShouldLinkParams();
The compilation workflow changes from building with Target
to building with CompilationConfig
, as demonstrated below:
// The current API
void Build(IRModule mod, const Target& target, ...);
// The proposed API
void Build(IRModule mod, const CompilationConfig& config, ...);
Existing Work
As proposed in the target specification and composite target RFCs, the existing effort converges to the following items.
First, host
is folded into the Target object, and the target_host
parameter in existing build APIs, in fact, are left for backward compatibility. The CheckAndUpdateHostConsistency
API developed by @zxybazh, is only used for backward compatibility reasons. Right now, the canonical way to specify targets with customized host is as easy as:
target = tvm.target.Target("cuda", host="llvm")
Second, in terms of multi-target and heterogeneous support, composite target is adopted as the current approach. Comparing composite target, which is target host plus a list of targets, with the proposed CompilationConfig
, which is also target host plus a list of target, it seems very much following the same idea, while CompilationConfig
is an extra layer of abstraction.
Third, canonical form of a Target is a JSON object, not a pain string. The target implementation already supports hierarchical parsing, e.g. target inside target inside array, etc. To support executor and runtime with attributes, we could extend the parser to support converting a JSON sub-object to an Executor/Runtime object, which is very much doable.
Discussion on the RFC
Overall, the RFC brings a dramatic change to the compilation infrastructure. This effort enforces a new assumption that we only have a single executor and a single runtime. However, I could see clean alternatives with more expressiveness, less effort required, no breaking change, but achieve the same goal.
First, under our unified IR efforts, the compilation in TVM is heading towards IRModule
to runtime::Module
abstraction. The executor, to the best of my understanding, is a runtime object that executes some artifacts that some BaseFuncs lowers to. For example, VM executor interprets VM bytecode, AOT executor may run the binary directly. Right now, there are some leaky abstraction, but our goal should be aligned under the direction that we address those leaks instead of bringing in more.
Second, the proposed APIs seem to be possible to be implemented with straightforward helper functions under the current abstraction. To give a line-by-line example:
ir_mod->GetConfig() -> CompilationConfig; // proposed in the RFC
GetTarget(id_mod) -> Target; // alternative
ir_mod->GetExecutor() -> Executor; // proposed in the RFC
GetExecutor(id_mod) -> Executor; // alternative
ir_mod->GetConfig()->ShouldLinkParams() -> bool; // proposed in the RFC
ShouldLinkParams(id_mod) -> bool; // alternative
In short, using accessor pattern here doesn’t bring in actual benefits, and can be replaced by simple helper functions.
Third, the RFC text doesn’t mention how it could improve the UX in TVMC command line. However, I would argue that the UX could be improved simply with target tags. For example, on CUDA GPUs, our target tag system supports creating CUDA targets with a single short string:
target = tvm.target.Target("nvidia/geforce-rtx-3070")
This carries all the information needed for a device, as long as we register them into our system, including compute version, shared memory size, local memory size, etc. This could perfectly solve the UX issue in TVMC by simply allowing target tags as arguments:
tvmc --target "nvidia/geforce-rtx-3070"
Last, there are cases where multiple executors working together. For example, if we want to offload some fragments to TensorRT executor, some to CUDA graph, while keep the rest in the VM, then the Relay function could potentially be partitioned into 3 Relay functions that targets to different executors. With composite target, we are able to attach different executors in the Target object in a more explicit way.
Conclusion
When designing the Target spec, it is intended to be considered as the synonym to CompilationConfig
. I may not have all the context here and my understanding could be limited, but as heavily involved in the Target design, from my PoV, for now the benefit of the RFC seems to be limited to certain issues Target is already able to do. Happy to chat more!