To elaborate on C2, while it is desirable and recommended to have consolidated runtime, executor choice when possible. Naturally there are cases that would requires a bit of generalization. The multi-machine case is one example.
There are also other examples that can appear on a single SoC. Consider the following scenario, where there is an accelerator that comes with a CPU-like co-processor as controller.
- host: arm
- runtime: vm
- vdevice0: accelerator-with-coprocessor
- host: risc-v
- runtime: graph
- device: my-accelerator
In this case, the host is a ARM chip that drives the overall computation(say through VM). The co-processor, however, also comes with its own controller, that is able to execute a sub-graph of computation, which in turn dispatches to my-accelerator. As a result, we will need to compile a tvm runtime(that may be different from the host) one, and use that to drive the graph computation on the co-processor.
To expand on the BYOC case, note that for BYOC that involves a sub-graph, the specification for the BYOC “target” is in nature a “CompilationConfig”-level structure. Because we would need to specify what is the leaf level target(cuda), as well as graph runtime runtime(TensorRT or cuda-graph). This brings another need to be able to embed a “CompilationConfig”-level structure in a “CompilationConfig”-level target.
Back to the compilation path. I agree that it is important to build a standard pipeline. I would also like to note that we need to design to be compatible of emerging needs. Allowing target specification to be recursive, while validating them, would help the ecosystem to develop these capabilities. Additionally, some of the needs can appear now, for example, we could see a need to have a more flexible VM runtime that drives GPU computation, while offloading subgraph to cuda-graph(more efficient and less flexible). While may not be possible to consolidate every compilation path in the beginning depending on the use case we talk about(just like initially we do not have unified single device and multi-device exec). Having a common config API(target), would bring a solid step toward unifications as the community work on these cases. It also provides a standard way for community to do extension in a composable way, without inventing other things that are not compatible to each other.
In reality, different target kind may have (slightly) different compilation path, although they can share a lot in common. In the case of compositional target like multi-device execution, the compilation pipeline of the multi-device exec needs to divide and then offload to the compilation pipelines of the specific target kind then link them together(in our case PackedFunc is out ABI).
Finally to build on @Mousius 's point. Allowing target to be recursive does not preclude structure or naming. Targets have kinds and schemas that attached to each kind. Further validation can also be done throughout the process. So instead of
(CompilationConfig)
-> (Target-CUDA), (Target-X86)
-> (Executor)
-> (Runtime)
We would get
(Target-Kind=Hetro-Exec)
-> (Target-Kind=CUDA), (Target-Kind=X86)
-> (Executor)
-> (Runtime)
From the UX’s pov, we do not need to force user to pass in such compositional ones(that is complicated) if they only care about single device execution (and canonicalize internally).
As a matter of fact, majority of the use cases we face right now are still under a single device scenarios and we want to make these cases simple for the user. CompilationConfig as it is right now is a union class of two kinds of targets:
- Single device target where only a host and target is involved
- Multi-device target where multiple devices are involved.
Being able to clearly differentiate the two and allow simpler UX for common single device scenario can be a plus for the users.
Regardless of the use cases, you will be able to leverage the tagging features at different level, so user can just pass in
build(mod, target="my-hetro-exec-platform0")