[pre-RFC] Compilation Configuration Representation

Hi @tqchen, could you explain why this is necessary

In this case, cuda-graph corresponds to the implementation of the graph executor (a coorection,cuda-graph in this case corresponds to the executor) in that BYOC module. And does not corresponds to the leaf-level target(CUDA). The BYOC still need information to specify how to generate kernels that are fed into the cuda-grap based executor, which is cuda. Additionally, there can be other fields such as libraries(e.g. TensorRT or cuDNN).

In short, some of the BYOC happens at graph level, which means they can benefit from CompilationConfig style compositional configurations.

It’d be better to keep Executor and Runtime separated

The particular RFC proposes to move executor and runtime away from the leaf-level target(LLVM, C) that generates operator kernels kernels. I agree with that logic – a “c” target do not gave to contain a “executor” and “runtime” field because they really specifies the components of the “graph driver” side of the program.

To translate to target structure, it will mean that the schema of target-kind=Deployment (or another name) would include a field of “executor” and “runtime”. But validation will reject a “c” or “llvm” target that comes with such a field.

The remaining difference is whether a structured class is necessary. I wonder if it can be addressed by having some of the structured object to subclass target and provide helper functions to access these fields, if that is really necessary.

The last thing that I would like to bring up again is the ability of tagging, which can be quite useful to our users. Specifically, use a tag (e.g. “my-hetro-exec-platform0”) to refer to the entire compilation configurations or some of the sub-components, that includes runtime/executor and host/device target specifications. One of the motivation for making things as one kind of target is to have that capability of some kind.

Trying to summarize and dissect:

  • A0: We all agree that “executor” and “runtime” field should be moved to something that is not a leaf level target
  • A1: There is a discussion about whether or not to make the second level compositional config a (subclass of ) Target
    • A1a: Create a separate struct, with the benefit of explicit fields
    • A1b: Create a separate structs that subclasses Target
    • A1c: Create a specific target kind whose schema corresponds to the second level class.
  • A2: There is a discussion about the need of recursive support in some of the second level config in the case of BYOC, cross device, and multi-node
  • A3: The capabilities of tagging a composed (target) config can be useful from UX point of view.

Thanks for the breakdown @tqchen, there’s some confusion here as to what the current behaviour is which I’ll try to clarify based on your points.

Migrating Target Attributes to IRModule agrees removing the TVM Executor and Runtime from the Target altogether and attaching them to the IRModule as attributes due to being more related to the compilation than the Target. This separates the concept of a Target executor/runtime and a TVM executor/runtime.

BYOC modules currently pass configuration via PassContext::Global() and should be able to use the same Target attributes as Targets when Target registered compiler flow customisation has been fully implemented. Currently, BYOC is registered at the same level as Target:

(CompilationConfig)
-> (Target LLVM), (Target CUDA), (BYOC MyCUDA)
-> (Executor)
-> (Runtime)

In future, BYOC should be a Target proper. In both cases of BYOC or Target, there is no need to add hierarchy to Target here as the Graph will be partitioned for MyCUDA before the CUDA target.

In the the Command Line Configuration Files RFC, Configurations themselves can be referenced by name with flexibility as to how they are defined in tvmc. Thus you can either give a Target configuration a tag for a single Target or name a complete Configuration, both with defined terminology and placement in the overall TVM hierarchy.

With the above series of interdependent works, I believe that A1a is the simplest and most natural for both users (who can reference a complete configuration or a Target tag if desired) and developers (who can easily ascertain the attributes of the TVM compilation from the structure). Both users and developers will benefit from the consistent and straight-forward definitions of terms within the hierarchy which we can include in documentation to explain how they are composed.

To further clarify on the BYOC part, there can be need that a BYOC module contains a graph-level property(of the executor that drives the MyCUDA graph) and kernel level property(of the code generator that outputs so it is indeed hierarchical and from functionality pov close to the CompilationOption.

Another thing that is related is the issue of serializing the configurations into logs. While this is not a requirement right now(most of the tuning itself serializes the target that only involves the per device part). Imagine we start to do a global search over a graph, in that case we need a way to serialize the config itself into the log(in this case there is a json format).

Of course, both the capability of serialization and tagging can be duplicated in each level of the structure, but can also benefit from some form of uniformity.

@Mousius thank you for raising this RFC and thanks for great discussions everyone.

For the most part I support the originally-proposed RFC.

I fully support A1a here. While it is tempting to try to define Target as a structure which models an arbitrary runtime environment, in practice, the range of runtime environments supported by TVM will change as TVM’s tuning capabilities grow. Additionally, Target currently plays a foundational part in the present AutoTVM design: it describes all of the compiler configuration which could affect a given autotuning measurement, and is therefore used as a key to describe the workload in autotuning logs.

Further, at present, there are things inside Target which do not impact autotuning:

  • --link-params
  • --executor
  • --runtime

Because of this, right now users can get into the undesirable experience of tuning a schedule without one of these parameters, then compiling for deployment with the parameters included, and seeing untuned implementations. Now, I bear some of the blame for this because I started this pattern in Target. However, it’s something we need to get rid of now that we have more tunable schedules landing in microTVM.

The fix for this is to remove these parameters from whatever we use to key the tuning logs. Currently, that’s Target.

So in my book, that’s also the definition of Target right now:

  • the set of options which could influence autotuning on one tvm::runtime::Device.

While I do support the effort to gradually improve TVM’s ability to model an arbitrary heterogeneous system (e.g. even those with multiple executors spread across a set of independent machines), modeling this inside Target means that we need to simultaneously confront two questions whenever we want to broaden Target with additional configuraiton:

  1. does this configuration affect autotuning?
  2. who is consuming this configuration?

Adopting A1a allows us to just answer the second question up front by grouping compiler configuration into data structures according to the compiler component which consumes them. Broadly, we have these areas which may need to consume compiler config:

  • Op-level code-generators (currently, this is the lowest common denominator describing what the Target options cover)
  • Graph-level code-generators (e.g. AOT, Graph, VM)
  • AutoTVM (e.g. parameters which may control scheduling)
  • AutoScheduler (e.g. parameters which may affect TensorIR lowering)
  • flow-level parameters (e.g. parameters which may be in PassConfig but which should potentially be captured into tuning logs such as tir.disable_vectorize)

Organizationally, my position is that it’s better to keep parameters grouped alongside others which are consumed by the same logical component of the compiler. This recognizes that the questions of scoping autotuning and modeling an execution environment are larger than any one RFC and are questions which TVM as a community will continue to refine as new improvements such as AutoScheduler, AutoTIR, etc are introduced. Adopting a composite structure provides a framework to keep things organized as we incrementally improve the compiler rather than defining a single open-ended struct.

This approach then argues for the following:

  • We adopt A1a, a composite top-level configuration structure which consists of pieces mapped to each compiler component
  • We tighten the definition of Target to mean “configuration parameters for a single codegen which affect autotuning.”
  • To accommodate the previous bullet, target_host is hoisted out of Target and becomes its own Target. See commentary in [RFC] Unified device/target/memory scope planning with regards to plans to add human-readable labels to Targets (e.g. dsp-cpu, low-power-cpu).
  • Autotuning keys continue for the moment to be confined to the contents of the Targets.

My position on this discussion is that we should still keep the configuration pieces organized according to the consuming compiler sub-component and express any relations in a sibling top-level structure. Here is an example of that in a futuristic world where we support splitting a model across multiple top-level executors:

{
    "targets": {
        "dsp-cpu": {
            "kind": "llvm",
            "mcpu": "cortex-a72",
        },
        "gpu": {
            "kind": "mali",
        },
        "low-power-cpu": {
            "kind": "llvm",
            "mcpu": "cortex-m0",
        },
    },
    "executors": {
        "dsp": {
            "targets": ["dsp-cpu", "gpu"],
            "target_host": ["dsp-cpu"],
            "executor": "vm",
            "runtime": "c++",
        },
        "low-power": {
            "targets": ["low-power-cpu"],
            "target_host": ["low-power-cpu"],
            "executor": "aot",
            "runtime": "c",
            "flow-config": {
                 "link-params": true,
                 "enable-byoc": ["cmsis-nn"],
            },
        },
    },
}

This is quite a forward-looking example. In practice, the effects of adopting A1a look to me at present like:

  1. Defining target_host as merely one of the sub-Targets included in CompilationConfig
  2. Splitting out the executor, runtime, and link-params keys from Target
  3. Avoiding introducing any recursion, which means I think that we should not adopt that aspect of the Composite Target RFC.

Great discussions so far. I think we have a good picture of what the choices are in terms of the data structures(the As), and we have different preferences in terms of choices.

Before we jump into the particular preference, it is helpful to look at different use scenarios that we are using the data structure and objectively analyze them from the following angles:

  • The UX interface
  • The feasibility of each kind of solutions under the needs
  • Possible pros and cons

Notably, the final preferences usually are not disagreements on the objective analysis. For example, I think that we all agree that recursive structure is more expressive, having an explicitly typed config is slightly more convenient than a specific target kind with the same schema for the particular use-cases that involves a two level structure.

Usually our preference is a result of how do we weight the different needs and pros and cons. Additionally, we may have a specific need(use case) in mind. To make a good choice, we would need to look at a broad class of needs. The bottom line is hopefully we can agree on the objective needs and analysis, then use them as basis to talk about the choice(that involves preference).

It is also very helpful for us to review the previous RFCs that comes to the current suggested design of Target and Composite

N0: Common use case, single device with host

While a lot of motivation in config comes from heterogenous devices, which is important. The most common use case we have right now is still the scenarios under a single device. Of course like CUDA, single device usually means there is a need of host driver. So one of the key need is how to make this type of usage as streamlined as possible.

From the user’s point of view, the program itself is as plain as “CUDA”. However there are two different states of functions during the phase of transformation

  • E0: A mixed host-device program
fn () {
   // cuda part
   b = alloc("global", size)
   launch cuda kernel 1 {
   }
   launch  cuda kernel 2 { 
   }
}
  • E1: A device program
   launch cuda kernel 1 {
   }

Both E0 and E1 can appear in different phases of transformations. From the users’ point of view, it is extremely helpful for them to be able to have attributes that specifies the constraints on both kind.

In the convention right now, E0 is achieved by the host field in a Target. While in the case of E1 it is simply a device program. Under the two-level config view. The host of E0 would can be obtained from the context Config(per target_host field).

  • From the UX’s pov, directly pass in Target with an optional host field present a simple API for this particular use case.
  • Having host under Target would make the constraint more explicit at the function level and differentiate E0 and E1.
  • For more complicated heterogenous case, having host under target would cause duplication, in which case a consistency checker and updater is needed.
  • Having an explicit host in the target can help the case where there are multiple host env, although this is also a rare case.

I will skip the personal preference comments for now.

N1: Embed into other systems

In a lot of cases we are thinking about generating a program that TVM take full control of allocator, device management and so on. So there can be a temptation to enforce precise heterogenous device info everywhere. On the other hand, at the PrimFunc level, we also need to be able to embed into other systems, and take decisions from the calling env. For example, in most of the cuda op-level case, we generate functions that works on any GPU and switches the context based on the device_id and type from the arguments.

For this particular need, we need to keep the target specification simple at the boundary level, that only involves host and device information. While leaving some of the device planning information at the driving part.

N2: Tagging and quick reference

The ability to tag and reference a configuration as a whole is one the key design of the Target system. From the user’s point of view, they do not necessarily cares about the codegen level concept. Instead, it is important to present the target environment as a whole. See the following example tags:

  • aws/c5: cloud instance name
  • arm/rasp4b: soc board name
  • nvidia/jetson-nano:cuda: soc board name

From the users’ pov, what they ultimately care about is what I want to deploy to. Being able to refer to the setting(or part of the setting) through tagging is an important for that experience.

N3: Represent a complicated heterogenous environments

One of the main motivation of the second level Config is to represent a more complicated heterogeneous environment, that is different from N0. Under such cases, there is a desire to propagate through some of the (virtual) device and memory scopea information across functions.

For this particular use case, an explicit config offers the a clear structure. A specific target kind with schema that follows the config can also implement the same feature.

One possible choice is to model everything in this way, as complicated cases cover simpler setup through another layers of wrapping. Fitting simpler common scenarios into a two-level setting may bring additional complications in UX. Especially if there is an ask for explicit construction.

N4: Ability to decompose

Through out the compilation and transformations. In a lot of cases we are decomposing problems into smaller problems. A function in IRModule can represent

  • A multi-machine program into single machine ones
  • A multi-device program into driving calls into single-device, host driving functions, but still invokes through PackedFunc(that contains a host part)
  • A single device, host driving program into device and host functions.

In the BYOC flow

  • A mixed-BYOC strategy program into multiple functions with own BYOC target
  • There can be a need for downstream BYOC to further decompose that into graph level executor config, and single kernel code-gen setting.

Through out the transformations we de-compose, and likely also tag the functions with possible constraints(that this particular function must satisfy). Having a common base for the constraints(for functions at different granularity is helpful. Given the nature of the framework is to be able to support and be future compatible to these decompositions.

N5: Automation needs

This ties back to N4. We need a common base config to indicate the constraints that the auto-tuning environment presents. Our most common case right now is single device with host setting. In such cases, target itself is only needed as part of the log.

If we see automation need as the need to be able to search over transformations of a program, subject to certain “target constraints”. Then naturally we will extend the scope to handle functions at different level(related to N4). For example, graph-level tuning would be one such example.

Considering the need to unify the automation infrastructure, it is certainly very helpful to have a common data structure to represent “target constraints” at different level(which can include executor configurations) so that there will be one serialization format and relatively streamlined mechanisms to handle all transformation cases(of a single device program, and executor device mixing case).

Hi @tqchen, I like your point that we need to be able to a) handle a lot of different setups and b) be adroit at changing focus as we transition from the overall systems view (eg during device planning), to target/host view, to specific device view, and so on. (Oh and I’ve probably broken things in the CompilationConfig stopgap I implemented since it assumes every Target needed for lowering must have a host, which breaks the E1 case.) So I see why folks are keen on the general recursive representation. And I could see that we’d want to replace the ‘config’ accessible from the IRModule as we change focus, especially as we transition into per-Target compilation.

One counterpoint to that approach is the resulting fragility of the passes that depend on it. E.g. I could imagine we end up with a lot of ICHECKS and accessors scattered inside pass impls which may not be apparent from the outside. (It reminds me a bit of the Windows Registry – a wonderfully universal and centralized data structure with opaque dependencies – but that’s unfair!).

Perhaps we could take an intermediate step: Explicitly enumerate the family of ‘compilation configs’ we already have as distinct classes. I think so far that’s

  • just-a-Target for eg lowering without worrying about the host shim
  • HostAndTarget for your E0 case
  • MulitTarget, which is what I got myself tangled up with in device planning and needed the CompliationConfig to help centralize some logic. There’s going to be a runtime & executor in each of those. We’ll also see some semi-generic way to go from cmd-line settings and configs into those classes. But perhaps we just don’t worry about that duplication just yet in return for clarifying what we support today (and save me from breaking anything else).

Then we could revisit with a more universal & recursive representation, particularly if we want to tackle the x-runtime/x-executor cases.

@mbs-octoml actually I am not that keen on arbitrary recursions(yet), since afterall the specific target kind and its schema is going to restrict what levels of recursions. I actually want us to be able to explicit enumerate, like you said, perhaps as part of a validator on possible kinds of “config” or target. Say a centralized ValidateTarget function.

From the N3’s pov, having explicit classes and embed them on the IRModule is perhaps fine. The main difference is likely going to be a compiled-time checked accessor, vs a runtime checked schema accessor(we are kind of using that already in someway via GetAttr).

Another middle ground could be introduce the auxilary typed data structures when we are building passes that need them, and reconstruct from a runtime Target spec. This is mainly considering the other needs listed.

In the meantime, there are other needs on the table that we need to think about besides N3, namely the ability to log those configs in a common way(the automation need N5), tag them for quick reference(N2) for better UX. So it is helpful to also consider these needs and how does the decisions affect code logics on those fronts.

Hi @tqchen,

Reading through the various needs there’s nothing which hasn’t already been covered by this RFC in combination with already accepted RFCs. Could you articulate the next steps?

All the the alternatives (A1a, A1b, A1c), should be able to cover the need that we initially bought up – around N3. Additionally, the Target system as it is now is already powerful enough to resolve the N3 related needs that was bought up, as the alternatives @junrushao listed along the A1c direction.

In all cases , it is certainly possible to resolve the problems with extra layers of abstractions and indirections. As a matter of fact, they are all very similar, except for how the data structure itself is built up.

So the main thing that would be helpful here is to understand the tradeoffs here under different contexts, given our previous discussions was focused around N3, it is also helpful to look at things from other needs.

To give some examples:

From N0’s pov, the ability to directly pass in Target with a host field is a good default solutions for this most comon combo, so in the case of API/UX design, we might want to encourage this kind of usage without worrying about additional fields for hetergenous setups in a config.

build(mod, Target("cuda", host="llvm"))

Additionally, the transition between E0 to E1 encourages a transition from Target with host field(that indicates a mixed host program) to a device only(without host).

From N2’s perspective. aws/c5 favors deployment target as a holistic thing(aka at the config level).

build(mod, "aws/c5")

Under the context of config and target, we will need to be able to say that a tag can refers to either a config and Target, which effectively complicates the tagging system and explaination here. Additionally, there will be needs to have a common mechanism to register the tags for both target and config. Making them more uniformed would make this perspective more streamlined.

From the N4’s pov, we will need to be able to represent the objects during decompositions, which means there will be need of smooth transitions of related information at the function level. For example, for some a function that involves mixed target host/device mixing the transitions to a device only. If that entails a difference in terms of the “target constraints”, e.g. for functions with multi-target it starts with a “config” attr, then for functions with a single device it becomes a “target” attr. Such transition is not as uniform.

In the context of N5, there will be a need to be able to log both single device target, or multitarget config as part of the autotuning logs in the same way. From the automation’s pov they are all “target constraints” of a function, or a collection of functions. As in N4, this would favor a single entity that captures the “target constraint” in an uniformed way, or at least a unified serialization mechanism and perhaps repr printing that covers the target involved.

Finally, we need to consider the overall UX perspectives about how to articulate to the user. On one hand we can certainly introduce a lot of concepts to the users in their most complete form. But the best APIs(e.g. keras is a great example) always aim to present to the users its simplest form for most important usecases.

Then we would get to a point where a user would ask “what is the difference between a config of a function that can run on multiple devices and a target of a function that only runs on one device?” While we can certainly come up with an answer. From UX point of view the tag of aws/c4(can indicate a config that involves runtime env) and nvidia/cuda12(indicate a single target) are so similar, to the extent that a user might feel an artifical boundary in here.

Importantly, majority of users do not have to deal with a MultiTarget setting. It is also unlikely that they needs to deal with explicit setting executor or runtime if we have a proper tag or good default. So our most common use case is the setting that contains a TargetWithHost. We want to be able to maximize the ease of use in this setting. Only asking the user to learn about target that comes with a host field, plus the ability to tag is the simplest way to tell the story, without introducing the extra concept of Config.

So the UX story is like a journey :

  • step0, useful for most common usecases: “you can use a target to specify the deployment environment constraint that you have on a single device, and you have the ability to tag the specification”.
  • step1: generalizing the same story for heterogenous usecases, “you can specify a MultiTarget, which is also a target with a specific schema to specify heterogenous execution case, and fine-tune runtime, executor setting, BTW you get the same ability to tag and log them in the same way as step0”

And if an user do not want to bother to hear about the steps. There is a simpler story: "just pick a tag that closely matches the platform of your interest, for example aws/g4:gpu".

This was covered in the original post:

Ah, I understand, if we don’t pass a Target and instead just pass a tag then you have to figure out which one to go for. The approach taken in Command Line Configuration Files is to wrap the Target in the JSON configuration. Which leads me to believe we should default to a Config level tag which is the highest level available? If we add in-between layers then this still holds? For the same reason I wouldn’t want to use Target to define everything, I can see the error in trying to explain a Target tag and a Config tag and leading to confusion as to what a tag is - which I think we’re agreeing on?

This RFC doesn’t aim to address how you use the configuration so much as define the fact the configuration will be there for you to use and rely on. Unified device/target/memory scope planning stands out to me as an RFC which discusses how to correctly annotate a function for a specific use-case and other than providing a consistent view of the world the CompilationConfig does not impact this.

Are you suggesting something as simple as configuration.to_json() or configuration.serialize_targets() which would return the array of JSON represented Target ? Re-using the already defined schema for Target and providing some way to extract it seems to function here?

Thanks @Mousius I am not suggesting a decision on solutions, but just want to broadly discuss the implication, of the engineering solutions. For example, to build on what you said

Which leads me to believe we should default to a Config level tag which is the highest level available? If we add in-between layers then this still holds? For the same reason I wouldn’t want to use Target to define everything, I can see the error in trying to explain a Target tag and a Config tag and leading to confusion as to what a tag is - which I think we’re agreeing on

This would results in two UX concepts. A target tag and config tag, and in the case of system implementations, possible two similar impls.

This RFC doesn’t aim to address how you use the configuration so much as define the fact the configuration will be there for you to use and rely on

I understand the original intention was to scope it as “top-level config”. However, because config itself is a data structure(just like target) that involves “constraint settings throughout compilation”, we naturally would to ask the following questions:

  • Does the top-level config have to remain as in its most general form, e.g. can it be a Union[Target,MultiTarget], as the most common case remains to be TargetWithHost
  • We might have need to propagate some of the multi-target constraint info in the future to function level, at that time point, which data structure to use(if there is both config and target).
  • The consistency of single device function’s target attr and multi-device function’s config attr.

Are you suggesting something as simple as configuration.to_json() or configuration.serialize_targets() which would return the array of JSON represented Target ? Re-using the already defined schema for Target and providing some way to extract it seems to function here?

Right, this would result in two concepts, target and config. Both of which are really similar to each other and both can appear in the same automation logs. We might need to build two set of mechanisms for both if they end up as drastically different data structure without a common base.

Independent from the engineering discussion. It would be useful to come back to the terminology and think about the UX consequence present that to the user. Obviously this is subjective, but worth to think about what can serve as a good story. I tried to search the term “Target” and “compiler target” on internet and here are some common interpretations:

  • Target platform that the output runs on.
  • Target program that we outputs.

“target platform” roughly aligns the first impression that comes up in my mind. A “platform” includes runtime libraries, hardwares, or any env that might affect the transformation behavior. Looking from that angle, “aws/c5” is certainly also a “target platform”. A SoC with multiple chipsets, NPU, CPU can also certainly be viewed as a “target platform”. A set of distributed machines can also be viewed as “target platform”.

So then it goes back to what stories we tell our users(affects developers as well), and whether or not that story aligns with the most common sense impressions.

First kind of story:

  • S0a: A target is specifies the “target platform”, any deployment environment constraints that you are interested.
  • S0b: If you are interested in multiple device settings, you can use a MultiTarget kind that composes up targets together and specifies the deployment env(that involves multiple devices).

Second kind of story:

  • S1: A target is specifies the single device deployment constraints, you will need to compose them up to form a config, that also specifies the runtime and executor of your model.

S0 ties back to the common sense stories with a progression(first start with only target on single device, simple and easily receptive, then generalize by reusing the same concept that aligns). S1 would require more understanding in differentiating the concept, and resolving confusion that why a SoC with multiple chipset is not a “target platform”.

Which leads me to believe we should default to a Config level tag which is the highest level available?

It would remain in the Config form on the IRModule, which means you could have either easily?

Whichever is appropriate for the use-case, having standardised access to that information means you could access whichever is most useful to you. If you want to query the configuration for an appropriate Target and tag a function with it, that’s an implementation detail of another part of the compiler.

Serialising of objects which don’t share a common base is pretty common in many projects, and it’s clear that Configuration encapsulates Target so can call the serialise internally? There’s no need to complicate this by making everything a sub-class of Target. And I believe what @areusch was saying is that we didn’t want anything but Target in the logs as it has no effect? Therefore encapsulating that with some function for creating logs from many pieces of the configuration may be useful?

@areusch and I had long discussion yesterday offline, and he helped me understand the concern from the UX perspective: If we fold executor into target, then it’s more difficult to separate the config coming from two parties, where one party impl the codegen and the other impl the executor.

On the other hand, my concern is the fragmentation of APIs. It has been a huge problem in the recent 1-2 years, and we do have alternatives not to do so.

Here is my proposal:

  • Part 1. Add Exector/Runtime fields to TargetNode:
class TargetNode {
  ...
  Executor executor;
  Runtime runtime;
};

class Executor {
  FromJSON();
  AsJSON();
};

class Runtime {
  FromJSON();
  AsJSON();
};
  • Part 2. Add a helper API to merge Target, Executor and Runtime
Target MergeTarget(Target target_without_executor_runtime, Executor executor, Runtime runtime);
  • Part 3. Allow separate specification of target, target_host, executor, runtime in TVMC, and internally use the proposed API in Part 2 to merge, validate and normalize them into a single Target object
tvmc --target "llvm" --executor "..." --runtime "..."
  • Part 4. For heterogeneous case, annotate the target onto each PrimFunc/RelayFunc to specify the target/runtime/executor
@tvm.script.ir_module
class Module:

   @T.func
   def tir_func():
     T.func_attrs({"target": JSON-Repr-of-Target-Obj}) # with runtime&executor included
     ...

   @R.func
   def relay_func():
     T.func_attrs({"target": JSON-Repr-of-Target-Obj}) # with runtime&executor included
     ...

Could you elaborate on this? I believe this isn’t solely a UX issue but also a hygiene factor within the compiler and how we represent the data structures internally so would rather not overload Target with Executor and Runtime. This RFC is proposing a suitable home for information that’s relevant across the compilation given we now have at least Executor and Runtime to include, but a side effect is bringing the tvmc view back into alignment with the internals of the compiler.

It’s also worth noting, that with the current RFC for Migrating Target Attributes to IRModule tvmc can glue this together with the relevant pieces, so from a user point of view they wouldn’t know how disparate the internals are but it would be a headache to maintain.

1 Like

Wow lots more discussion here! Thanks @junrushao for writing up our discussions. So one thing I’d like to point out is that the recursive Target approach is not more expressive than the approach proposed by this original RFC. Expressing a “contains” relation can be done equivalently well by

  • defining a recursion relationship inside the Target data structure
  • defining another structure which describes the contains relationship (akin to a join table in database theory)

The main reason I am interested in the join-table approach here is that it vastly simplifies MergeTarget as described by Junru above. And, I’d like to point out that it’s not sufficient here to merely define a function which hides the complexity under the covers. Users need to be able to understand what this function is doing because they are writing the inputs (though we are providing a tag, Command Line Configuration Files contemplates an expansion of the role of tagging to include tagging a partial configuration, as discussed earlier. I’m not sure it will be generally simple to explain how MergeTarget works as Target grows if we adopt the general approach of trying to attach every piece of compiler config to some Target which “owns” it.

The drawback of the flat configuration structure is it could be more difficult to consume inside the compiler. We should discuss whether this is truly an issue and how to mitigate it.

Finally, while I do think it’s important to arrive at an an expressive, understandable Target data structure, as the compiler grows more complex, I think there is a tension between a Target structure which is clear to the user and a Target structure which naturally reflects the organization of the compiler (and therefore has the nice properties of clearly delineating where config should live and being easy to route in the compiler top-level). Hopefully, the organization of the compiler is also such that it’s logical to a power user interested in creating a complex config. However, here I think that UX sugar can help to compose the common target patterns such as “cuda” (which really means 1 CUDA device with an implied “llvm” host). We already do this today anyway, so I suspect it will continue to play a role in the future.

@Mousius I totally agree to make things hygiene, and believe folding things into Target is the correct and consistent approach.

First of all, the automation system solely relies on the target object to understand the code dispatching, hardware specs and runtime information. Without having the information in the Target object, the automation system won’t be aware of the full picture. For example, if we switch executor from VM to TensorRT, the performance can be much different, and so if executor is not inside Target, then the automation system will be confused and learn a wrong objective.

Second, as the direction we are moving towards, the Target object is guiding our IRModule-to-IRModule transformation in lowering, and IRModule-to-Module in compilation. Wrapping with an extra layer seems to architecturally change our compilation pipeline, while alternatives do exist and both seem to be equivalently expressive.

Third, the practice folding all compilation-related information has been adopted consistently in TVM. For example, we may specify the libraries dispatched to via cuda --libs=cudnn. Similarly in LLVM, the target triple is designed in a consistent way, where we could specify libc and other environments.

Historically, fragmentation accumulates in TVM across layers. For example, we have different scheduling and auto scheduling system, slightly-but-not-identical and error-prone APIs for different executors, compilation workflow between relay, relay byoc and tvm, etc. Adding new top-level user-facing data structures, if alternative exists with the same expressiveness and UX, then I would say it would probably lead to more user confusion.

On the other hand, I totally agree and am aware that a graph-level compile involves the interaction of multiple parts, including device, host, runtime and executor. The main concern from me here is that we already have Target as a canonical spec formally, which is already able to express this structure without hurting UX.

What about we define a new target kind:

{
  "kind": "packaged", # probably need a better name, please propose new ones
  "runtime": "crt",   # the "runtime" in the proposal
  "executor": {       # the codegen target for relay function
                      # i.e. the "executor" in the proposal
    "kind": "vm/aot",
    ...
  },
  "target": {
    "kind": "cuda",   # the target that TIR generates to
    "host": {
      "kind": "llvm", # the codegen target for the host-side driver code
       ...
    }
  },
}

We can provide helpers to sugar the construction of this recursive target:

def tvm.target.packaged(
  target="cuda",
  executor="aot",
  runtime="crt",
): ...

In the common case, user only need to feed with “cuda”, because we could provide a good default. For advanced use cases, users could use the packaged API to specify their own specification for the package

2 Likes

@Mousius Hello, Where is this work at now ?