Hi all, though our Collage work (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md) is pretty self contained, I have found some changes to Target make both the control of available ‘backends’ and the book-keeping for candidate partitions much easier. Since those are global changes, and given the discussion on CompilationConfig is still going ([pre-RFC] Compilation Configuration Representation - #54 by areusch), I figured it’s best to check in here to make sure I’m heading in the right direction.
Roughly, Collage needs:
- A way to convey which BYOC backends are available for implementing partitions.
- A way to associate a BYOC backend with a candidate partition.
The approach I’ve taken in the prototype (https://github.com/mbs-octoml/mbs-tvm/tree/mbs-collage-sketch) is:
- Allow all TargetKinds to have a “compiler” String attribute. (I include it in the
TVM_REGISTER_TARGET_KIND macro). With that the user can express, say:
which is distinct from:
Target("cuda -arch=sm_80 -compiler=tensorrt")
which are both considered specializations of:
Target("cuda -arch=sm_80 -compiler=cutlass")
Collage itself can also just record the
Targetfor candidate partitions, since that object is now sufficient to determine all downstream processing.
- Allow the ‘target’ argument to the various build entry points to also be a list (in addition to a dict for the legacy heterogenous case, or a single target for the homogenous case).
- Centralize all target & target_host handling in the existing
Array<Target>as the generic representation of ‘bag-o-targets’ which the
CompilationConfigclass is responsible for validating and canonicalizing.
- When PlanDevices needs to know the
Targetto associate with a particular
DLDeviceTypeit defers to the
CompilationConfig. That class finds the least-specialized available
Target. So in the above example,
kDLCUDAwould map to
- Some cleanup of the Python target handling code then falls out naturally.
An alternative design is to layer the Collage notion of ‘backend’ on top of targets, and introduce some new entry point or convention by which the user can convey that. However, I went with the above approach because it seemed a graceful extension to the existing heterogenous target handling, and it elegantly ties targets and BYOC backends together. After all, it does not make sense to try tensorrt on a non-cuda target, and so on.
Let me know what you think. I can peel out a PR from the prototype if that would help, but honestly I don’t think the actual code changes will be very informative.