[pre-RFC] Compilation Configuration Representation

Inspired by the work of @mbs-octoml, I give you a new RFC for CompilationConfig!

Summary

This RFC supersedes Migrating IRModules to Attributes by replacing the various attributes associated with the Runtime and Executor with a single CompilationConfig object that encapsulates the configuration of the compiler for a given IRModule. By collecting this together, it introduces on object which can be coupled to the IRModule and used through-out compilation with guarantees about the properties of the configuration.

Motivation

Argument Duplication

When implementing Migrating IRModules to Attributes, it became clear that the arguments were getting duplicated in many places across the codebase and never collated. Such as the following tvmc CLI:

tvmc --target=c --executor=aot --runtime=crt

Which populates the arguments executor, runtime and target all the way into the compilation flow, rather than collating them and passing the pre-processed representation. This introduces places we can make errors due to missing one of the three and always having to replicate the signature.

Single Point of Setup

Futher to this, many areas of the code required access to the compiler configurations non-optionally, this introduces uncertainty in later passes which should be able to guarantee the compiler has been configured - thus making the dynamic attributes less ideal for this use-case:

// Hopefully this is set!
ir_mod->GetAttr<Executor>(kExecutor).value()->name;

As can be seen by adding SEScope, there’s also need for a single place to collect and define configuration rather than cause duplication of effort, with multiple calls to setup various parts of the configuration, such as Target environment which should be known when the compilation flow begins:

CheckAndUpdateHostConsistency(&target, &target_host);

By moving these to a single property of the IRModule which is required to exist before entry into the compiler flow, this reduces both the cognitive overhead of accessing the configuration and provides a place to encapsulate the logic of setup early in the compilation flow.

This is a fixed set of configuration alongside the existing PassContext::Global() to solidify the configuration requirements of the TVM Compiler.

Guide-level explanation

User API

A user creates and passes a configuration to relay.build(mod, config), leading to something similar to:

ir_mod = IRModule.from_expr(function)
config = CompilationConfig(
    target_host=target,
    targets=[target],
    executor=Executor("aot", {"interface-api": "c", "unpacked-api": True}),
    runtime=Runtime("crt", {"system-lib": True})
)
relay.build(ir_mod, config)

To easily create a default CompilationConfig from a single Target, the from_target option is provided - this can be used to add syntactic ease in relay.build:

def relay.build(ir_mod, config, ...):
  if isinstance(config, Target):
    config = CompilationConfig.from_target(config)

The C++ API is then amended to require configuration:

/*!
   * \brief Build relay IRModule
   *
   * \param mod Relay IRModule
   * \param config Compilation Config for this compiler run
   * \param mod_name Name of the module
   */
  void Build(IRModule mod, const CompilationConfig& config, const String mod_name)

This means a user can continue to easily create and pass an IRModule to relay.build as an opaque object whilst providing the configuration alongside.

Developer API

The developer facing API for the IRModule changes for internal passes to easily access the configuration:

ir_mod->GetConfig()->GetExecutor()

And functionality related to inspecting this configuration can be added to CompilationConfig which can see all available configuration properties:

ir_mod->GetConfig()->ShouldLinkParams()

Importantly, for ad-hoc information passed alongside with the IRModule, attributes continue to be available:

WithAttr<String>(ir_mod, "woof", "woof");

Reference-level explanation

IRModule Configuration

To incorporate this into the IRModule, a property of type CompilationConfig will be added and exposed via a function GetConfig() in line with the Google C++ guidelines except for being a public property for TVMs internal node structures:

class IRModuleNode {
  CompilationConfig config;

  const CompilationConfig& GetConfig() {
    return config;
  }
}

The actual CompilationConfig class will represent the current and future shape of configuration, such as Executor and Runtime, alongside Target and Target host (example illustrative not carved into stone):

class CompilationConfigNode {
  Target target_host;
  Array<Target> targets;
  Runtime runtime;
  Executor executor;
}

From Python the module will be passed into C++ with the CompilationConfig to start the compilation flow:

def relay.build(ir_mod, config: Union[CompilationConfig, Target], ...):
    if isinstance(config, Target):
        config = CompilationConfig.from_target(config)
    mod["build"](ir_mod, config)

Which is then connected within C++:

void Build(IRModule mod, const CompilationConfig& config, const String mod_name) {
    mod->config = config;
    ...
}

When creating IRModule → IRModule passes within the compiler this property should now be guaranteed to exist in the main Relay flow.

Non-Relay Flows

There are a number of ways of accessing the internals of TVM which have been exposed to the user, for these to continue to function the CompilationConfig will be settable from Python such that:

ir_mod = ir_mod.with_configuration(config)

The above will attach the configuration within the alternative pathways, it is the opinion of the author that these should be eventually deprecated in favour of a single standard compilation path.

Compilation Configuration Creators

Within python/tvm/target/target.py there are a number of functions which returned Targets with configuration built in, such as:

These are user-facing configurations, and will be ported to tvm/target/config.py to match the include/tvm/target/compilation_config.h and provide the same logic but returning a CompilationConfig:

def micro(model="unknown", options=None, executor="graph"):
  opts = _merge_opts(
      MICRO_SUPPORTED_MODELS[model] + ["-runtime=c", f"-model={model}"],
      options,
  )
  target = Target(" ".join(["c"] + opts))
  return CompilationConfig(
    target_host=[target],
    targets=[target],
    executor=Executor(executor),
    runtime=Runtime("crt", {
      "system-lib": executor == "graph"
    })
  )

Drawbacks

  • This is more information on the IRModule alongside existing attributes and pass information
  • The entry contract for relay.build and the TVM Compiler changes to require configuration of the initial IRModule

Rationale and alternatives

Continue to use PassContext::Global() and general purpose attributes for fixed compilation configuration, using the general purpose options here hides the details of the compilation configuration and makes it harder for us to provide a robust representation of none-Optional configuration. If all things are considered Optional then they have to be considered as none-configured and requiring multiple passes of reconfiguration to provide safety to the users.

Prior art

Unresolved questions

  • Do we want to fully deprecate passing a target to relay.build? The assumption is made that we don’t want to fully deprecate it and a sugar has been added above for populating a CompilationConfig straight from a Target
  • Similarly, do we want to provide a similar helper for target and target_host:
def relay.build(ir_mod, config, target_host=None):
  if isinstance(config, Target):
    config = CompilationConfig.from_target(config, target_host=target_host)
  • What is the future for IRModule attributes and PassContext - this RFC aims to provide one piece of concrete configuration which we know can be fixed at the start of compilation but there should be a default choice for ad-hoc configuration within the compilation flow.

Future possibilities

By codifying the properties of the compilation flow in CompilationConfig and providing IRModule attributes we may be able to deprecate PassContext::Global() and only use the passed state in TVM.

@areusch @manupa-arm @jroesch @tqchen

1 Like

cc @junrushao @zxybazh @comaniac

Thumbs up from me, I’d like to see this proceed to PRs.

  • Agree with adding config as a first-class field to IRModule.
  • The build API backwards compat should be straightforward by extension of the existing if isinstance checks. We can emit deprecation warnings for a release or two.
  • I think you want your tvmc cmdline flags to be interpreted as rewrites on CompilerConfig in some uniform way, right?
  • We’ll need to figure out some verbage for what should go into the config top level vs target etc. It’s a significant risk the config turns into the union of misc fields across targets, passes etc. Many compilers end up in this world by accident and it’s hard to unwind. But some principles and discipline up front should work.
  • Not sure about how to reconcile this with the implicit contexts for Target and the PassConfigManager. Personally I’m not a fan of implicit context stacks but it is an established TVM patterns.

Thanks for the RFC. Generally I am in support of the change.

  • Moving the config to IRModule could better avoid overloading the target.
  • The design of targets list and target host seems really similar to a composite target, can we actually use a single composite target (where each target can be a composite target) with executor and runtime field as compilation config?
  • CheckAndUpdateHostConsistency is designed to make sure target.host and target host are consistent along the way. If we have a better machanism to do that since we have target host as a field in compilation configuration, we may further reconsider the usage of target inside of compilation flow.
  • On the target side, it’s a breaking change but I would be optimistic on deprecation after a couple releases.
  • Nit: personally I prefer to use target host as a field of target, or complication config, instead of directly have a argument name for that. from_target can also retrieve the target host from given target.

Thanks for the discussions. I think it is a good opportunty to discuss how can we flow target information through the compilation. Putting some fruits of thoughts

How to flow target information through compilation

One of the main design goal that we want to move towards to is this ability to incrementally transform the code(some of the transformations may not be done in the official build pipeline). Take BYOC as an example, in the future we might invoke a custom pass that slices out a subgraph and generate a function that requires a specific target lowering(e.g. CUDA). The diagram below from TensorIR blitz course shows one example of such a flow:

In summary, there can be two goals:

  • G0: Ability to config a single standard compilation path.
  • G1: Ability to enable incremental customization (via python API), attach constraints(such as BYOC) and then send back to the build function for further lowering.

G0 is certainly sufficient for some of the usecases like tvmc. However, it is also important for us to take inspiration, and think more about making G1 as a first class citizen. A natural consequence of G1 is that we will need to preserve certain “target-constraint” information in IRModule(so previous transformations’s decision are self-contained), either as attr of a function(e.g. this function have to be compiled in CUDA), or IRModule.

It would be great for us to collectively think about a way on how to standardize for G1 while still have the ability to support G0.

CompilationConfig and Composite Target

Back to the CompilationConfig itself. I agree with @zxybazh that it looks quite like a special case of composite target and it is useful to discuss whether or not we can simply merge it as a structured Target.

Coming back to the definition of target, if we look at LLVM’s target triple, -arch-subarch-os-vendor-env-object format. We can find that it also contains runtime choice information like ABI for the libc, OS type and so on. So one could argue that choices like tvm_runtime type, packed function API can be part of a composite target (although they do not need to be in the leaf “c”).

The advantage of having a CompilationOption class:

  • It is a structured class with explicit fields

The advantage of having making CompilationOption as a composite Target

  • We still have structured fields with target configurations
  • We get the benefit of being able to tag, and record target
  • CompilationOption can appear as a field of sub-target of something else. Imagine that we need to offload a subgraph to another customized compilation, which may needs its own specification of the heterogenous “targets”.
  • Same API argument(Target) for both graph level compilation and operator level compilation.

Hi @tqchen and @zxybach,

cc : @mbaret

What is a Composite Target ?

TVM being a multi-target compiler, it would be a bit confusing to use a Array of Targets as another Composite Target – I think its the terminology what is confusing here.

A composite target sounds like a target that codegen intimately in a single codegen path for different devices rather than a structure that is used by TVM to trigger different codegen flows. I think we all agree we need a way to communicate this Options/Flags through out the lowering but I personally would not be in favor of attaching this to a (Composite) target – that result in overloading the term “Target”.

I believe we can still do it as the target is part of the CompilationConfig

CompilationConfig is supposed to contain attributes that are not specific to a target. Thus, they would be still accesible in the IRModule. Wont they ?

This could also be said with respect to CompilationConfig being available for both graph level compilation and operator level compilation – just that “targets” are part of CompilationConfig.

In a summary what we need to discuss is :

  • What is the best term to use to structure that holds information/flags that are target independent and hold the set of targets in the same time?
  • Moreover, it would be great to reserve the term “Composite” target to a target that intimately codegen to multiple devices without divergence in the compilation pathway.

Thanks for the discussions. To begin with, I am not that attached to the particular choice of name. We can for example, decide to introduce another target kind (“hetero-target”, “myawesome-target”, “platform”, “CompilationOption”) whose attr fields matches exactly like CompilationOption.

I think our discussion boils down to the following quenstion

What can be called a “Target” in tvm

Intuitively, to many users, target refers to the “target platform” or environment that they want to run the program on. In a typical clang target triple, the following elements can be part of a target:

  • ISA (x86, arm, riscv)
  • runtime library (musl, libc)
  • operation system env (windows, linux)
  • vendor

Of course in most of the settings here target refers to a single device, usually with a single codegen path. These are targets at the leaf level.

However, as we start to build compilers for ML. The “target” in users’ mind is different. For example, I want to run my program as fast as possible on aws/c4.4xlarge, or nvidia/jetson-nano. Some of these “targets” already involves multiple codegen path(host code and device code). When we start to involve graph or vm for the high level program driver, the vm/graph/aot choice is another codegen path on the driving path of the program.

As the field evolves, the concept of “target” can change further. Right now we are talking about a single SoC with multiple devices. What if we develop an interest in deploying onto the following distributed environment.

- machine0:
   - host: x86
   - vdevice0: cuda
- machine1:
   - host: arm
   - vdevice0: vulkan

We might also interested in the following byoc customization where we offload part of the computation to byoc-myawesome-cuda strategy, which needs a self-contained specification of host and library targets that makes use of cuda-graph runtime. We want to embed it in a vm runtime that invoke the byoc-myawesome-cuda as an opaque function.

- host: x86
- vdevice0: byoc-myawesome-cuda
    - host: x86
    - runtime: cuda-graph
    - vdevice0: cuda
    - library: tensor-rt
- vdevice1: cuda
- runtime: vm

Can we call the above descriptions as “target”? From a UX’s perspective they certainly can be called target. Since from user’s perspective it is a specification of “target environtment”. In the context of machine learning they certainly can usually go beyond a single codegen path.

Another thing to note here is that some of these examples requires a level of compositionality that goes beyond two-level(target then compilation-option). In the context of multi-machine setting, the setting per machine roughly maps to CompilationOption being used here. Similarly, in the case of byoc-myawesome-cuda, vdevice0 itself would benefit from its own runtime specification. Another concept(another target kind) is needed to introduce another concept in order to support the top-level composition.

UX Benefit of a Target – Tagging

Besides the benefit of the compositionality, one major UX benefit of target is the ability to tag. It can be really complicated to manually specify a compositional compilatin option. In most cases, we want users to directly leverage pre-built tags. For example, build for nvidia/jetson-nano:cuda, build for aws/c4.4xlarge, build for arm/soc-name:aot(that directly implies unpacked_api). These tags create short hands for us to setup the compositional configurations.

The ability to let build function takes in tags that quickly maps to both codegen, runtime, and library configurations would greatly improve overall user experiences. Making CompilationOption (or whatever we decided to call it) a Target would allow us to reuse this feature effectively and recursively.

Discussions

The main discussion point here is what is the scope of target. As we can see that:

  • A0: On one hand, we can say that the configuration strictly follows a two-level structure. Where target is on the leaf, specifies a single codegen path. While we use a separate name for the top-level compositions.
  • A1: On the other hand, we can see the need of:
    • More than two levels of compositions
    • The UX need to reuse tagging mechanism and simplify users’ inputs to the compiler.

From a two-level compositional view. Personally I think reusing Target for CompilationOption is not strictly more complicated, modulo the right kind naming. While the needs in ML can certainly go beyond that. Which makes me think going for target compositionality is not a bad idea.

I agree with @tqchen that improving composite targets could be more beneficial and general. We (with @junrushao and @zhiics) previously attempted to improve the target system to allow more flexible attributes, such as a pass sequence / runtime / etc specifically for the target, which is very similar to what TQ illustrated and what this RFC proposed, but found that it’s not an easy task due to the current target system implementation.

Meanwhile, the concept of compilation configuration has been used for some BYOC backends already, but they are currently relying on PassContext. For example, TensorRT codegen takes the configuration from PassContext during relay.build:

mod, config = partition_for_tensorrt(mod, params)
target = "cuda"
with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': config}):
    lib = relay.build(mod, target=target, params=params)

Although the config here is generated internally, I think this could still be a good driving example to see how could we make a composite target that incorporates the backend specific config.

Thank you @Mousius for the RFC! It’s great to read about potential user experience issues of the current Target system, and happy to discuss about potential ways to improve it.

Proposeds API in the RFC

CompilationConfig, as proposed in this RFC, aims to improve UX by wrapping a list of targets, runtime and execution information in an extra layer of abstraction.

The core API is demonstrated in the RFC as:

config = CompilationConfig(
    target_host=target,
    targets=[target],
    executor=Executor("aot", {"interface-api": "c", "unpacked-api": True}),
    runtime=Runtime("crt", {"system-lib": True})
)

To improve the developer experience, a few other APIs are proposed along with the data structure:

CompilationConfigNode::GetExecutor();
CompilationConfigNode::ShouldLinkParams();

The compilation workflow changes from building with Target to building with CompilationConfig, as demonstrated below:

// The current API
void Build(IRModule mod, const Target& target, ...);
// The proposed API
void Build(IRModule mod, const CompilationConfig& config, ...);

Existing Work

As proposed in the target specification and composite target RFCs, the existing effort converges to the following items.

First, host is folded into the Target object, and the target_host parameter in existing build APIs, in fact, are left for backward compatibility. The CheckAndUpdateHostConsistency API developed by @zxybazh, is only used for backward compatibility reasons. Right now, the canonical way to specify targets with customized host is as easy as:

target = tvm.target.Target("cuda", host="llvm")

Second, in terms of multi-target and heterogeneous support, composite target is adopted as the current approach. Comparing composite target, which is target host plus a list of targets, with the proposed CompilationConfig, which is also target host plus a list of target, it seems very much following the same idea, while CompilationConfig is an extra layer of abstraction.

Third, canonical form of a Target is a JSON object, not a pain string. The target implementation already supports hierarchical parsing, e.g. target inside target inside array, etc. To support executor and runtime with attributes, we could extend the parser to support converting a JSON sub-object to an Executor/Runtime object, which is very much doable.

Discussion on the RFC

Overall, the RFC brings a dramatic change to the compilation infrastructure. This effort enforces a new assumption that we only have a single executor and a single runtime. However, I could see clean alternatives with more expressiveness, less effort required, no breaking change, but achieve the same goal.

First, under our unified IR efforts, the compilation in TVM is heading towards IRModule to runtime::Module abstraction. The executor, to the best of my understanding, is a runtime object that executes some artifacts that some BaseFuncs lowers to. For example, VM executor interprets VM bytecode, AOT executor may run the binary directly. Right now, there are some leaky abstraction, but our goal should be aligned under the direction that we address those leaks instead of bringing in more.

Second, the proposed APIs seem to be possible to be implemented with straightforward helper functions under the current abstraction. To give a line-by-line example:

ir_mod->GetConfig() -> CompilationConfig; // proposed in the RFC
GetTarget(id_mod) -> Target; // alternative

ir_mod->GetExecutor() -> Executor; // proposed in the RFC
GetExecutor(id_mod) -> Executor; // alternative

ir_mod->GetConfig()->ShouldLinkParams() -> bool; // proposed in the RFC
ShouldLinkParams(id_mod) -> bool; // alternative

In short, using accessor pattern here doesn’t bring in actual benefits, and can be replaced by simple helper functions.

Third, the RFC text doesn’t mention how it could improve the UX in TVMC command line. However, I would argue that the UX could be improved simply with target tags. For example, on CUDA GPUs, our target tag system supports creating CUDA targets with a single short string:

target = tvm.target.Target("nvidia/geforce-rtx-3070")

This carries all the information needed for a device, as long as we register them into our system, including compute version, shared memory size, local memory size, etc. This could perfectly solve the UX issue in TVMC by simply allowing target tags as arguments:

tvmc --target "nvidia/geforce-rtx-3070"

Last, there are cases where multiple executors working together. For example, if we want to offload some fragments to TensorRT executor, some to CUDA graph, while keep the rest in the VM, then the Relay function could potentially be partitioned into 3 Relay functions that targets to different executors. With composite target, we are able to attach different executors in the Target object in a more explicit way.

Conclusion

When designing the Target spec, it is intended to be considered as the synonym to CompilationConfig. I may not have all the context here and my understanding could be limited, but as heavily involved in the Target design, from my PoV, for now the benefit of the RFC seems to be limited to certain issues Target is already able to do. Happy to chat more!

Thanks for the interesting discussion.

@tqchen @junrushao ,

In terms of the definition of the target, I see two categories of arguments presented here :

C1 : The executor, runtime, should belong to the target – even if means duplication.

C2 : The targets should be hierarchical and recursive

For C1, I would rather use this argument to make runtime and executor an attribute of the target rather than to support calling an Array of Targets, another target. I can see this being true in the following scenario (as pointed out by @tqchen), if its a scenario we want to target for.

The following scenario is motivated by the fact it is economical run a single model inference across multiple machines considering the data transfer costs of intermediary tensors of the model. Just want to make sure if this is something the community considers as a compilation scenario that TVM should aim for.

For C2,

The examples presented so far does not seem to go beyond mostly a flat Array of Targets. Maybe in the multiple machine scenario, an alternative could have been a Array of CompilationConfig (or whatever we decide to call it). However, This would not be viable if we have recursive targets (where the recursion depth > 1)

Do you guys see a likely scenario in which we will have a Composite Target that is composed of Composite Targets of Composite Targets ? (i.e. where we cant express the target we want to compile to as a Array of Targets coupled with a host target – I believe host target differs only in the multiple machine scenario),

If that is the case, how would TVM establish codegen path divergence (partitioning at different levels of IR) to such an hierarchical target ?

Thanks @manupa-arm , building on what you said.

  • I do not think there is a strong contention on C1, the main point is that the target can be recursive. So a target like the follows is totally OK.
- kind: hetro-exec
- runtime : crt
- executor: vm
- devices: [ se_scope0, se_scope1 ]

So the argument is not about where/or how these field should be structured in a recursive data structure. Something that looks like a CompilationOption is OK from my pov. But the suggestion is that we make that one kind of (recursive) target, as from UX’s pov it can be seen that way.

  • I want to add C3: The ability to leverage tagging in target and improve the overall user experience is a very important factor.

I am going to discuss C2 on a separate post since that worths more examples.

Oh wow, I’ve been away for a few days and really appreciate the amount of discussion that’s arrived :smile_cat: Thanks @mbs-octoml, @zxybazh, @tqchen, @comaniac, @junrushao and @manupa-arm!

Firstly let’s address a few specifics which may help narrow the discussion slightly:

There’s an unfortunate overloading of the terms Executor and Runtime which is the inherent risk with a diverse heterogeneous compiler :smile_cat:. In this RFC, let’s define the Executor and Runtime as specific to the TVMs Executor and Runtime rather than the implementation of a Target. How a Target gets generated and linked is outside the immediate scope of the TVM Runtime and Executor as they’re designed to invoke the generated Target code.

Thanks @mbs-octoml, I missed some Prior Art here! In tvmc, we have the concept of a configuration as defined in Command Line Configuration Files. CompilationConfig would allow this to be a standard way of defining such as configuration with Targets within it - this meets the needs of the Target tagging which @junrushao and @tqchen are discussing by instead wrapping them into a CompilationConfig that represents the system. Within the Command Line Configuration Files RFC, it defines the <TYPE> and indicates the use of --config for cloud instances. The terminology would shift from a tagged Target to a CompilationConfig here to represent they exist at two different levels of the hierarchy?

As defined in Migrating Target Attributes to IRModule, splitting the TVM concepts of Target, Runtime and Executor means we can more clearly see what is most relevant to a specific Target. Which means that call-site annotations for Target are limited to only options that are relevant to a specific Target rather than to an IRModule. By virtue of working on this RFCs implementation, although we should still land the implementation agreed in the RFC, it does illustrate how we can better manage the representation of this configuration internally to TVM.

One reason not to motivate this as purely a tvmc concern is that tvmc is the CLI interface to TVM, if a user attempts to use tvmc and then moves to a Python script they should not be re-learning the interface to TVM.

This sounds sensible, and also a feature of CompilationConfig is the ability to specify the complete picture of the system which TVM is being built for including all Targets which can be used by all Passes. Specific annotations of storage and execution make sense to be defined at call-sites within the IR rather than at the top level with the IRModule - what CompilationConfig provides is a frame of reference to do those annotations and pick from a variety of Targets and Devices which the IRModule is constrained to. As we continue with Target registered compiler flow customisation, annotating a call-site with a Target will become standardised with the BYOC flow whether partitioned or otherwise to match the expectation you described with partitioned Targets.

This doesn’t rule out the possibility of using a composite Target as a Target in the targets list as we’re not redefining how that works here - rather defining a bounding box for system level configuration within TVM.

The end state for this configuration update would be to run a single pass over the CompilationConfig early on to ensure the internal state was correct using CheckAndUpdateHostConsistency which guarantees that subsequent Passes such as device or memory planning are safe making assumptions about the state of the used Targets. Hopefully that clarifies it’s less of a replacement, but more of a consolidation of the logic to early in the compilation flow if these checks are still required :smile_cat: We’d still need to have Target annotations within the IR and that Target will therefore have to be stable during compilation.

Where we’re at

Going over this thread a few times, the discussion revolves around:

M0. Split the CompilationConfig from Target

(CompilationConfig)
-> (Target), (Target), (Target)
-> (Executor)
-> (Runtime)

M1. Recursively allowing Target to represent any system

(Tagged Target)
-> (Target), (Target), (Target)
-> (Executor)
-> (Runtime)

It is my opinion, and the motivation behind this RFC, that better defining the CompilationConfig would relieve cognitive load on the user and provide definitions which can be bounded easily. By continuing to use M1 the term Target is increasingly overloaded and difficult for both developers of TVM and more importantly users of TVM. This hierarchical terminology has prior art in large scale cloud frameworks, such as Kubernetes which uses different terminology for Cluster, Deployment, Service, Pod and Container which are all different levels of granularity of computing resources; the decision here is both a UX decision and a practical separation of concerns for both users and developers of Kubernetes.

To elaborate on C2, while it is desirable and recommended to have consolidated runtime, executor choice when possible. Naturally there are cases that would requires a bit of generalization. The multi-machine case is one example.

There are also other examples that can appear on a single SoC. Consider the following scenario, where there is an accelerator that comes with a CPU-like co-processor as controller.

- host: arm
- runtime: vm
- vdevice0: accelerator-with-coprocessor
    - host: risc-v
    - runtime: graph
    - device: my-accelerator

In this case, the host is a ARM chip that drives the overall computation(say through VM). The co-processor, however, also comes with its own controller, that is able to execute a sub-graph of computation, which in turn dispatches to my-accelerator. As a result, we will need to compile a tvm runtime(that may be different from the host) one, and use that to drive the graph computation on the co-processor.

To expand on the BYOC case, note that for BYOC that involves a sub-graph, the specification for the BYOC “target” is in nature a “CompilationConfig”-level structure. Because we would need to specify what is the leaf level target(cuda), as well as graph runtime runtime(TensorRT or cuda-graph). This brings another need to be able to embed a “CompilationConfig”-level structure in a “CompilationConfig”-level target.

Back to the compilation path. I agree that it is important to build a standard pipeline. I would also like to note that we need to design to be compatible of emerging needs. Allowing target specification to be recursive, while validating them, would help the ecosystem to develop these capabilities. Additionally, some of the needs can appear now, for example, we could see a need to have a more flexible VM runtime that drives GPU computation, while offloading subgraph to cuda-graph(more efficient and less flexible). While may not be possible to consolidate every compilation path in the beginning depending on the use case we talk about(just like initially we do not have unified single device and multi-device exec). Having a common config API(target), would bring a solid step toward unifications as the community work on these cases. It also provides a standard way for community to do extension in a composable way, without inventing other things that are not compatible to each other.

In reality, different target kind may have (slightly) different compilation path, although they can share a lot in common. In the case of compositional target like multi-device execution, the compilation pipeline of the multi-device exec needs to divide and then offload to the compilation pipelines of the specific target kind then link them together(in our case PackedFunc is out ABI).

Finally to build on @Mousius 's point. Allowing target to be recursive does not preclude structure or naming. Targets have kinds and schemas that attached to each kind. Further validation can also be done throughout the process. So instead of

(CompilationConfig)
-> (Target-CUDA), (Target-X86)
-> (Executor)
-> (Runtime)

We would get

(Target-Kind=Hetro-Exec)
-> (Target-Kind=CUDA), (Target-Kind=X86)
-> (Executor)
-> (Runtime)

From the UX’s pov, we do not need to force user to pass in such compositional ones(that is complicated) if they only care about single device execution (and canonicalize internally).

As a matter of fact, majority of the use cases we face right now are still under a single device scenarios and we want to make these cases simple for the user. CompilationConfig as it is right now is a union class of two kinds of targets:

  • Single device target where only a host and target is involved
  • Multi-device target where multiple devices are involved.

Being able to clearly differentiate the two and allow simpler UX for common single device scenario can be a plus for the users.

Regardless of the use cases, you will be able to leverage the tagging features at different level, so user can just pass in

build(mod, target="my-hetro-exec-platform0")

Hi @tqchen, I can understand that a recursive Target could be the solution to a multitude of problems but it also introduces an over-arching ambiguity for both users of TVM and developers. It also creates a maintenance overhead of trying to manage an increasingly diverse definition of Target rather than a set of simple component definitions for use in the TVM compiler.

Coming back to this, the LLVM Target provides a specific set of constructs specific to a single output which constrains it and makes it easy to interpret. TVM as a heterogeneous compiler encapsulates many Targets, of which we can have a multitude. TVM Targets can be defined at the same conceptual level as other compilers. By taking similar concepts and mapping them appropriately we create not only a good user experience but also a good developer experience where terms are mapped to a single role in the compiler. In this case Configuration represents the entire TVM configuration, Targets map to the same layer of the hierarchy as the actual backends themselves.

This is a great example of where the Target means something different as you recurse through different levels of Target. To motivate this further we can extend the example (using the Deployment conceptual example from Kubernetes):

M0

(CompilationConfig)
-> (Deployment)
    -> (Target LLVM), (Target OpenCL)
    -> (Executor VM)
    -> (Runtime CPP)
-> (Deployment)
    -> (Target LLVM)
    -> (Executor Graph)
    -> (Runtime RPC)

M1

(Target)
-> (Target)
    -> (Target LLVM), (Target OpenCL)
    -> (Executor VM)
    -> (Runtime CPP)
-> (Target)
    -> (Target LLVM)
    -> (Executor Graph)
    -> (Runtime CPP)

M1 introduces increasing ambiguity where-as M0 provides clear terminology and statically available information. We may choose to introduce the additional level of Deployment or similar in future given the use-cases @tqchen describes around cloud platforms (or not as necessary in future as the compiler evolves). Note-ably the concept of Target is still the same concept as from other compilers we use as generators in M0.

The Executor and Runtime represented in the CompilationConfig are the TVM Executor and Runtime, Target-specific implementations are kept to within the Target itself. This maintains the connection between the Target and the backend in use, whereas the Configuration encapsulates the TVM collective view of the world.

Thus for the above case it’d simply be:

(CompilationConfig)
-> (Target LLVM), (Target CUDA)
-> (Executor)
-> (Runtime)

Taking CUDA as a BYOC Target with a graph partitioner, this would be pre-marked as part of the overall IRModule for the given nodes. This is exactly how BYOC operates today and this RFC does not aim to change this behaviour.

Agree that consolidating all of the paths is going to take time and effort, and dealing with emerging requirements is a long standing need for any software project. As this RFC aims to supersede a previous RFC, future RFCs should aim to further iterate on this concept.

The distinction proposed in this RFC is that Target can continue to prevail for simple use cases where you target a single backend and be wrapped by TVM configuration (however that is defined) internally. The Configuration is the container for the actual complete internal representation for TVM. This can be achieved by checking the incoming type and creating the default wrappers where appropriate, but they’re at different conceptual levels from each other.

Being able to quickly and easily articulate the usage of both Configuration and Target creates a simpler and more approachable project for both developers and users. A further general motivation is the engineering practice to model and define core constructs within the architecture and provide separation of concerns, single responsibility and a clear hierarchy of components.

Thanks @Mousius . Some clarifications, in the case of BYOC, there needs to be a nested level (Target BYOC)

(Target CompilationConfig)
-> (Target LLVM), (Target CUDA)
-> (Target BYOC CompilationConfig)
    -> Runtime = cuda-graph
    -> Target = cuda
-> (Executor)
-> (Runtime)

To build on what you said. I think we all agree that structure is useful. In the case of target, the structure is represent as specific kind of target on that layer, e.g. we can call have a target kind follows the same terminology you come up with. For example, we can have a target kind that is called Deployment and another target kind that is called CompilationConfig(or a better name). With additional benefit of being able to use the tagging mechanism.

Hi @tqchen, could you explain why this is necessary? As the we integrate Target registered compiler flow customisation doesn’t this just become a Target("cuda-graph") which has the relevant BYOC infrastructure registered to it and Target attributes for configuration?

Given one of the motivations here is to simplify the compiler flow and user experience by creating pre-defined structures rather than introducing more dynamic behaviour, I’d suggest it’d be better to keep Executor and Runtime separated as agreed in Migrating Target Attributes to IRModule which leaves Targets represented at the correct level of the hierarchy and not create further confusion as to the definition of a Target. Though it’d be good to hear if others have strong opinions one way or the other.

Hi @tqchen, could you explain why this is necessary

In this case, cuda-graph corresponds to the implementation of the graph executor (a coorection,cuda-graph in this case corresponds to the executor) in that BYOC module. And does not corresponds to the leaf-level target(CUDA). The BYOC still need information to specify how to generate kernels that are fed into the cuda-grap based executor, which is cuda. Additionally, there can be other fields such as libraries(e.g. TensorRT or cuDNN).

In short, some of the BYOC happens at graph level, which means they can benefit from CompilationConfig style compositional configurations.

It’d be better to keep Executor and Runtime separated

The particular RFC proposes to move executor and runtime away from the leaf-level target(LLVM, C) that generates operator kernels kernels. I agree with that logic – a “c” target do not gave to contain a “executor” and “runtime” field because they really specifies the components of the “graph driver” side of the program.

To translate to target structure, it will mean that the schema of target-kind=Deployment (or another name) would include a field of “executor” and “runtime”. But validation will reject a “c” or “llvm” target that comes with such a field.

The remaining difference is whether a structured class is necessary. I wonder if it can be addressed by having some of the structured object to subclass target and provide helper functions to access these fields, if that is really necessary.

The last thing that I would like to bring up again is the ability of tagging, which can be quite useful to our users. Specifically, use a tag (e.g. “my-hetro-exec-platform0”) to refer to the entire compilation configurations or some of the sub-components, that includes runtime/executor and host/device target specifications. One of the motivation for making things as one kind of target is to have that capability of some kind.

Trying to summarize and dissect:

  • A0: We all agree that “executor” and “runtime” field should be moved to something that is not a leaf level target
  • A1: There is a discussion about whether or not to make the second level compositional config a (subclass of ) Target
    • A1a: Create a separate struct, with the benefit of explicit fields
    • A1b: Create a separate structs that subclasses Target
    • A1c: Create a specific target kind whose schema corresponds to the second level class.
  • A2: There is a discussion about the need of recursive support in some of the second level config in the case of BYOC, cross device, and multi-node
  • A3: The capabilities of tagging a composed (target) config can be useful from UX point of view.

Thanks for the breakdown @tqchen, there’s some confusion here as to what the current behaviour is which I’ll try to clarify based on your points.

Migrating Target Attributes to IRModule agrees removing the TVM Executor and Runtime from the Target altogether and attaching them to the IRModule as attributes due to being more related to the compilation than the Target. This separates the concept of a Target executor/runtime and a TVM executor/runtime.

BYOC modules currently pass configuration via PassContext::Global() and should be able to use the same Target attributes as Targets when Target registered compiler flow customisation has been fully implemented. Currently, BYOC is registered at the same level as Target:

(CompilationConfig)
-> (Target LLVM), (Target CUDA), (BYOC MyCUDA)
-> (Executor)
-> (Runtime)

In future, BYOC should be a Target proper. In both cases of BYOC or Target, there is no need to add hierarchy to Target here as the Graph will be partitioned for MyCUDA before the CUDA target.

In the the Command Line Configuration Files RFC, Configurations themselves can be referenced by name with flexibility as to how they are defined in tvmc. Thus you can either give a Target configuration a tag for a single Target or name a complete Configuration, both with defined terminology and placement in the overall TVM hierarchy.

With the above series of interdependent works, I believe that A1a is the simplest and most natural for both users (who can reference a complete configuration or a Target tag if desired) and developers (who can easily ascertain the attributes of the TVM compilation from the structure). Both users and developers will benefit from the consistent and straight-forward definitions of terms within the hierarchy which we can include in documentation to explain how they are composed.