[pre-RFC] Compilation Configuration Representation

Great discussions so far. I think we have a good picture of what the choices are in terms of the data structures(the As), and we have different preferences in terms of choices.

Before we jump into the particular preference, it is helpful to look at different use scenarios that we are using the data structure and objectively analyze them from the following angles:

  • The UX interface
  • The feasibility of each kind of solutions under the needs
  • Possible pros and cons

Notably, the final preferences usually are not disagreements on the objective analysis. For example, I think that we all agree that recursive structure is more expressive, having an explicitly typed config is slightly more convenient than a specific target kind with the same schema for the particular use-cases that involves a two level structure.

Usually our preference is a result of how do we weight the different needs and pros and cons. Additionally, we may have a specific need(use case) in mind. To make a good choice, we would need to look at a broad class of needs. The bottom line is hopefully we can agree on the objective needs and analysis, then use them as basis to talk about the choice(that involves preference).

It is also very helpful for us to review the previous RFCs that comes to the current suggested design of Target and Composite

N0: Common use case, single device with host

While a lot of motivation in config comes from heterogenous devices, which is important. The most common use case we have right now is still the scenarios under a single device. Of course like CUDA, single device usually means there is a need of host driver. So one of the key need is how to make this type of usage as streamlined as possible.

From the user’s point of view, the program itself is as plain as “CUDA”. However there are two different states of functions during the phase of transformation

  • E0: A mixed host-device program
fn () {
   // cuda part
   b = alloc("global", size)
   launch cuda kernel 1 {
   }
   launch  cuda kernel 2 { 
   }
}
  • E1: A device program
   launch cuda kernel 1 {
   }

Both E0 and E1 can appear in different phases of transformations. From the users’ point of view, it is extremely helpful for them to be able to have attributes that specifies the constraints on both kind.

In the convention right now, E0 is achieved by the host field in a Target. While in the case of E1 it is simply a device program. Under the two-level config view. The host of E0 would can be obtained from the context Config(per target_host field).

  • From the UX’s pov, directly pass in Target with an optional host field present a simple API for this particular use case.
  • Having host under Target would make the constraint more explicit at the function level and differentiate E0 and E1.
  • For more complicated heterogenous case, having host under target would cause duplication, in which case a consistency checker and updater is needed.
  • Having an explicit host in the target can help the case where there are multiple host env, although this is also a rare case.

I will skip the personal preference comments for now.

N1: Embed into other systems

In a lot of cases we are thinking about generating a program that TVM take full control of allocator, device management and so on. So there can be a temptation to enforce precise heterogenous device info everywhere. On the other hand, at the PrimFunc level, we also need to be able to embed into other systems, and take decisions from the calling env. For example, in most of the cuda op-level case, we generate functions that works on any GPU and switches the context based on the device_id and type from the arguments.

For this particular need, we need to keep the target specification simple at the boundary level, that only involves host and device information. While leaving some of the device planning information at the driving part.

N2: Tagging and quick reference

The ability to tag and reference a configuration as a whole is one the key design of the Target system. From the user’s point of view, they do not necessarily cares about the codegen level concept. Instead, it is important to present the target environment as a whole. See the following example tags:

  • aws/c5: cloud instance name
  • arm/rasp4b: soc board name
  • nvidia/jetson-nano:cuda: soc board name

From the users’ pov, what they ultimately care about is what I want to deploy to. Being able to refer to the setting(or part of the setting) through tagging is an important for that experience.

N3: Represent a complicated heterogenous environments

One of the main motivation of the second level Config is to represent a more complicated heterogeneous environment, that is different from N0. Under such cases, there is a desire to propagate through some of the (virtual) device and memory scopea information across functions.

For this particular use case, an explicit config offers the a clear structure. A specific target kind with schema that follows the config can also implement the same feature.

One possible choice is to model everything in this way, as complicated cases cover simpler setup through another layers of wrapping. Fitting simpler common scenarios into a two-level setting may bring additional complications in UX. Especially if there is an ask for explicit construction.

N4: Ability to decompose

Through out the compilation and transformations. In a lot of cases we are decomposing problems into smaller problems. A function in IRModule can represent

  • A multi-machine program into single machine ones
  • A multi-device program into driving calls into single-device, host driving functions, but still invokes through PackedFunc(that contains a host part)
  • A single device, host driving program into device and host functions.

In the BYOC flow

  • A mixed-BYOC strategy program into multiple functions with own BYOC target
  • There can be a need for downstream BYOC to further decompose that into graph level executor config, and single kernel code-gen setting.

Through out the transformations we de-compose, and likely also tag the functions with possible constraints(that this particular function must satisfy). Having a common base for the constraints(for functions at different granularity is helpful. Given the nature of the framework is to be able to support and be future compatible to these decompositions.

N5: Automation needs

This ties back to N4. We need a common base config to indicate the constraints that the auto-tuning environment presents. Our most common case right now is single device with host setting. In such cases, target itself is only needed as part of the log.

If we see automation need as the need to be able to search over transformations of a program, subject to certain “target constraints”. Then naturally we will extend the scope to handle functions at different level(related to N4). For example, graph-level tuning would be one such example.

Considering the need to unify the automation infrastructure, it is certainly very helpful to have a common data structure to represent “target constraints” at different level(which can include executor configurations) so that there will be one serialization format and relatively streamlined mechanisms to handle all transformation cases(of a single device program, and executor device mixing case).

Hi @tqchen, I like your point that we need to be able to a) handle a lot of different setups and b) be adroit at changing focus as we transition from the overall systems view (eg during device planning), to target/host view, to specific device view, and so on. (Oh and I’ve probably broken things in the CompilationConfig stopgap I implemented since it assumes every Target needed for lowering must have a host, which breaks the E1 case.) So I see why folks are keen on the general recursive representation. And I could see that we’d want to replace the ‘config’ accessible from the IRModule as we change focus, especially as we transition into per-Target compilation.

One counterpoint to that approach is the resulting fragility of the passes that depend on it. E.g. I could imagine we end up with a lot of ICHECKS and accessors scattered inside pass impls which may not be apparent from the outside. (It reminds me a bit of the Windows Registry – a wonderfully universal and centralized data structure with opaque dependencies – but that’s unfair!).

Perhaps we could take an intermediate step: Explicitly enumerate the family of ‘compilation configs’ we already have as distinct classes. I think so far that’s

  • just-a-Target for eg lowering without worrying about the host shim
  • HostAndTarget for your E0 case
  • MulitTarget, which is what I got myself tangled up with in device planning and needed the CompliationConfig to help centralize some logic. There’s going to be a runtime & executor in each of those. We’ll also see some semi-generic way to go from cmd-line settings and configs into those classes. But perhaps we just don’t worry about that duplication just yet in return for clarifying what we support today (and save me from breaking anything else).

Then we could revisit with a more universal & recursive representation, particularly if we want to tackle the x-runtime/x-executor cases.

@mbs-octoml actually I am not that keen on arbitrary recursions(yet), since afterall the specific target kind and its schema is going to restrict what levels of recursions. I actually want us to be able to explicit enumerate, like you said, perhaps as part of a validator on possible kinds of “config” or target. Say a centralized ValidateTarget function.

From the N3’s pov, having explicit classes and embed them on the IRModule is perhaps fine. The main difference is likely going to be a compiled-time checked accessor, vs a runtime checked schema accessor(we are kind of using that already in someway via GetAttr).

Another middle ground could be introduce the auxilary typed data structures when we are building passes that need them, and reconstruct from a runtime Target spec. This is mainly considering the other needs listed.

In the meantime, there are other needs on the table that we need to think about besides N3, namely the ability to log those configs in a common way(the automation need N5), tag them for quick reference(N2) for better UX. So it is helpful to also consider these needs and how does the decisions affect code logics on those fronts.

Hi @tqchen,

Reading through the various needs there’s nothing which hasn’t already been covered by this RFC in combination with already accepted RFCs. Could you articulate the next steps?

All the the alternatives (A1a, A1b, A1c), should be able to cover the need that we initially bought up – around N3. Additionally, the Target system as it is now is already powerful enough to resolve the N3 related needs that was bought up, as the alternatives @junrushao listed along the A1c direction.

In all cases , it is certainly possible to resolve the problems with extra layers of abstractions and indirections. As a matter of fact, they are all very similar, except for how the data structure itself is built up.

So the main thing that would be helpful here is to understand the tradeoffs here under different contexts, given our previous discussions was focused around N3, it is also helpful to look at things from other needs.

To give some examples:

From N0’s pov, the ability to directly pass in Target with a host field is a good default solutions for this most comon combo, so in the case of API/UX design, we might want to encourage this kind of usage without worrying about additional fields for hetergenous setups in a config.

build(mod, Target("cuda", host="llvm"))

Additionally, the transition between E0 to E1 encourages a transition from Target with host field(that indicates a mixed host program) to a device only(without host).

From N2’s perspective. aws/c5 favors deployment target as a holistic thing(aka at the config level).

build(mod, "aws/c5")

Under the context of config and target, we will need to be able to say that a tag can refers to either a config and Target, which effectively complicates the tagging system and explaination here. Additionally, there will be needs to have a common mechanism to register the tags for both target and config. Making them more uniformed would make this perspective more streamlined.

From the N4’s pov, we will need to be able to represent the objects during decompositions, which means there will be need of smooth transitions of related information at the function level. For example, for some a function that involves mixed target host/device mixing the transitions to a device only. If that entails a difference in terms of the “target constraints”, e.g. for functions with multi-target it starts with a “config” attr, then for functions with a single device it becomes a “target” attr. Such transition is not as uniform.

In the context of N5, there will be a need to be able to log both single device target, or multitarget config as part of the autotuning logs in the same way. From the automation’s pov they are all “target constraints” of a function, or a collection of functions. As in N4, this would favor a single entity that captures the “target constraint” in an uniformed way, or at least a unified serialization mechanism and perhaps repr printing that covers the target involved.

Finally, we need to consider the overall UX perspectives about how to articulate to the user. On one hand we can certainly introduce a lot of concepts to the users in their most complete form. But the best APIs(e.g. keras is a great example) always aim to present to the users its simplest form for most important usecases.

Then we would get to a point where a user would ask “what is the difference between a config of a function that can run on multiple devices and a target of a function that only runs on one device?” While we can certainly come up with an answer. From UX point of view the tag of aws/c4(can indicate a config that involves runtime env) and nvidia/cuda12(indicate a single target) are so similar, to the extent that a user might feel an artifical boundary in here.

Importantly, majority of users do not have to deal with a MultiTarget setting. It is also unlikely that they needs to deal with explicit setting executor or runtime if we have a proper tag or good default. So our most common use case is the setting that contains a TargetWithHost. We want to be able to maximize the ease of use in this setting. Only asking the user to learn about target that comes with a host field, plus the ability to tag is the simplest way to tell the story, without introducing the extra concept of Config.

So the UX story is like a journey :

  • step0, useful for most common usecases: “you can use a target to specify the deployment environment constraint that you have on a single device, and you have the ability to tag the specification”.
  • step1: generalizing the same story for heterogenous usecases, “you can specify a MultiTarget, which is also a target with a specific schema to specify heterogenous execution case, and fine-tune runtime, executor setting, BTW you get the same ability to tag and log them in the same way as step0”

And if an user do not want to bother to hear about the steps. There is a simpler story: "just pick a tag that closely matches the platform of your interest, for example aws/g4:gpu".

This was covered in the original post:

Ah, I understand, if we don’t pass a Target and instead just pass a tag then you have to figure out which one to go for. The approach taken in Command Line Configuration Files is to wrap the Target in the JSON configuration. Which leads me to believe we should default to a Config level tag which is the highest level available? If we add in-between layers then this still holds? For the same reason I wouldn’t want to use Target to define everything, I can see the error in trying to explain a Target tag and a Config tag and leading to confusion as to what a tag is - which I think we’re agreeing on?

This RFC doesn’t aim to address how you use the configuration so much as define the fact the configuration will be there for you to use and rely on. Unified device/target/memory scope planning stands out to me as an RFC which discusses how to correctly annotate a function for a specific use-case and other than providing a consistent view of the world the CompilationConfig does not impact this.

Are you suggesting something as simple as configuration.to_json() or configuration.serialize_targets() which would return the array of JSON represented Target ? Re-using the already defined schema for Target and providing some way to extract it seems to function here?

Thanks @Mousius I am not suggesting a decision on solutions, but just want to broadly discuss the implication, of the engineering solutions. For example, to build on what you said

Which leads me to believe we should default to a Config level tag which is the highest level available? If we add in-between layers then this still holds? For the same reason I wouldn’t want to use Target to define everything, I can see the error in trying to explain a Target tag and a Config tag and leading to confusion as to what a tag is - which I think we’re agreeing on

This would results in two UX concepts. A target tag and config tag, and in the case of system implementations, possible two similar impls.

This RFC doesn’t aim to address how you use the configuration so much as define the fact the configuration will be there for you to use and rely on

I understand the original intention was to scope it as “top-level config”. However, because config itself is a data structure(just like target) that involves “constraint settings throughout compilation”, we naturally would to ask the following questions:

  • Does the top-level config have to remain as in its most general form, e.g. can it be a Union[Target,MultiTarget], as the most common case remains to be TargetWithHost
  • We might have need to propagate some of the multi-target constraint info in the future to function level, at that time point, which data structure to use(if there is both config and target).
  • The consistency of single device function’s target attr and multi-device function’s config attr.

Are you suggesting something as simple as configuration.to_json() or configuration.serialize_targets() which would return the array of JSON represented Target ? Re-using the already defined schema for Target and providing some way to extract it seems to function here?

Right, this would result in two concepts, target and config. Both of which are really similar to each other and both can appear in the same automation logs. We might need to build two set of mechanisms for both if they end up as drastically different data structure without a common base.

Independent from the engineering discussion. It would be useful to come back to the terminology and think about the UX consequence present that to the user. Obviously this is subjective, but worth to think about what can serve as a good story. I tried to search the term “Target” and “compiler target” on internet and here are some common interpretations:

  • Target platform that the output runs on.
  • Target program that we outputs.

“target platform” roughly aligns the first impression that comes up in my mind. A “platform” includes runtime libraries, hardwares, or any env that might affect the transformation behavior. Looking from that angle, “aws/c5” is certainly also a “target platform”. A SoC with multiple chipsets, NPU, CPU can also certainly be viewed as a “target platform”. A set of distributed machines can also be viewed as “target platform”.

So then it goes back to what stories we tell our users(affects developers as well), and whether or not that story aligns with the most common sense impressions.

First kind of story:

  • S0a: A target is specifies the “target platform”, any deployment environment constraints that you are interested.
  • S0b: If you are interested in multiple device settings, you can use a MultiTarget kind that composes up targets together and specifies the deployment env(that involves multiple devices).

Second kind of story:

  • S1: A target is specifies the single device deployment constraints, you will need to compose them up to form a config, that also specifies the runtime and executor of your model.

S0 ties back to the common sense stories with a progression(first start with only target on single device, simple and easily receptive, then generalize by reusing the same concept that aligns). S1 would require more understanding in differentiating the concept, and resolving confusion that why a SoC with multiple chipset is not a “target platform”.

Which leads me to believe we should default to a Config level tag which is the highest level available?

It would remain in the Config form on the IRModule, which means you could have either easily?

Whichever is appropriate for the use-case, having standardised access to that information means you could access whichever is most useful to you. If you want to query the configuration for an appropriate Target and tag a function with it, that’s an implementation detail of another part of the compiler.

Serialising of objects which don’t share a common base is pretty common in many projects, and it’s clear that Configuration encapsulates Target so can call the serialise internally? There’s no need to complicate this by making everything a sub-class of Target. And I believe what @areusch was saying is that we didn’t want anything but Target in the logs as it has no effect? Therefore encapsulating that with some function for creating logs from many pieces of the configuration may be useful?

@areusch and I had long discussion yesterday offline, and he helped me understand the concern from the UX perspective: If we fold executor into target, then it’s more difficult to separate the config coming from two parties, where one party impl the codegen and the other impl the executor.

On the other hand, my concern is the fragmentation of APIs. It has been a huge problem in the recent 1-2 years, and we do have alternatives not to do so.

Here is my proposal:

  • Part 1. Add Exector/Runtime fields to TargetNode:
class TargetNode {
  ...
  Executor executor;
  Runtime runtime;
};

class Executor {
  FromJSON();
  AsJSON();
};

class Runtime {
  FromJSON();
  AsJSON();
};
  • Part 2. Add a helper API to merge Target, Executor and Runtime
Target MergeTarget(Target target_without_executor_runtime, Executor executor, Runtime runtime);
  • Part 3. Allow separate specification of target, target_host, executor, runtime in TVMC, and internally use the proposed API in Part 2 to merge, validate and normalize them into a single Target object
tvmc --target "llvm" --executor "..." --runtime "..."
  • Part 4. For heterogeneous case, annotate the target onto each PrimFunc/RelayFunc to specify the target/runtime/executor
@tvm.script.ir_module
class Module:

   @T.func
   def tir_func():
     T.func_attrs({"target": JSON-Repr-of-Target-Obj}) # with runtime&executor included
     ...

   @R.func
   def relay_func():
     T.func_attrs({"target": JSON-Repr-of-Target-Obj}) # with runtime&executor included
     ...

Could you elaborate on this? I believe this isn’t solely a UX issue but also a hygiene factor within the compiler and how we represent the data structures internally so would rather not overload Target with Executor and Runtime. This RFC is proposing a suitable home for information that’s relevant across the compilation given we now have at least Executor and Runtime to include, but a side effect is bringing the tvmc view back into alignment with the internals of the compiler.

It’s also worth noting, that with the current RFC for Migrating Target Attributes to IRModule tvmc can glue this together with the relevant pieces, so from a user point of view they wouldn’t know how disparate the internals are but it would be a headache to maintain.

1 Like

Wow lots more discussion here! Thanks @junrushao for writing up our discussions. So one thing I’d like to point out is that the recursive Target approach is not more expressive than the approach proposed by this original RFC. Expressing a “contains” relation can be done equivalently well by

  • defining a recursion relationship inside the Target data structure
  • defining another structure which describes the contains relationship (akin to a join table in database theory)

The main reason I am interested in the join-table approach here is that it vastly simplifies MergeTarget as described by Junru above. And, I’d like to point out that it’s not sufficient here to merely define a function which hides the complexity under the covers. Users need to be able to understand what this function is doing because they are writing the inputs (though we are providing a tag, Command Line Configuration Files contemplates an expansion of the role of tagging to include tagging a partial configuration, as discussed earlier. I’m not sure it will be generally simple to explain how MergeTarget works as Target grows if we adopt the general approach of trying to attach every piece of compiler config to some Target which “owns” it.

The drawback of the flat configuration structure is it could be more difficult to consume inside the compiler. We should discuss whether this is truly an issue and how to mitigate it.

Finally, while I do think it’s important to arrive at an an expressive, understandable Target data structure, as the compiler grows more complex, I think there is a tension between a Target structure which is clear to the user and a Target structure which naturally reflects the organization of the compiler (and therefore has the nice properties of clearly delineating where config should live and being easy to route in the compiler top-level). Hopefully, the organization of the compiler is also such that it’s logical to a power user interested in creating a complex config. However, here I think that UX sugar can help to compose the common target patterns such as “cuda” (which really means 1 CUDA device with an implied “llvm” host). We already do this today anyway, so I suspect it will continue to play a role in the future.

@Mousius I totally agree to make things hygiene, and believe folding things into Target is the correct and consistent approach.

First of all, the automation system solely relies on the target object to understand the code dispatching, hardware specs and runtime information. Without having the information in the Target object, the automation system won’t be aware of the full picture. For example, if we switch executor from VM to TensorRT, the performance can be much different, and so if executor is not inside Target, then the automation system will be confused and learn a wrong objective.

Second, as the direction we are moving towards, the Target object is guiding our IRModule-to-IRModule transformation in lowering, and IRModule-to-Module in compilation. Wrapping with an extra layer seems to architecturally change our compilation pipeline, while alternatives do exist and both seem to be equivalently expressive.

Third, the practice folding all compilation-related information has been adopted consistently in TVM. For example, we may specify the libraries dispatched to via cuda --libs=cudnn. Similarly in LLVM, the target triple is designed in a consistent way, where we could specify libc and other environments.

Historically, fragmentation accumulates in TVM across layers. For example, we have different scheduling and auto scheduling system, slightly-but-not-identical and error-prone APIs for different executors, compilation workflow between relay, relay byoc and tvm, etc. Adding new top-level user-facing data structures, if alternative exists with the same expressiveness and UX, then I would say it would probably lead to more user confusion.

On the other hand, I totally agree and am aware that a graph-level compile involves the interaction of multiple parts, including device, host, runtime and executor. The main concern from me here is that we already have Target as a canonical spec formally, which is already able to express this structure without hurting UX.

What about we define a new target kind:

{
  "kind": "packaged", # probably need a better name, please propose new ones
  "runtime": "crt",   # the "runtime" in the proposal
  "executor": {       # the codegen target for relay function
                      # i.e. the "executor" in the proposal
    "kind": "vm/aot",
    ...
  },
  "target": {
    "kind": "cuda",   # the target that TIR generates to
    "host": {
      "kind": "llvm", # the codegen target for the host-side driver code
       ...
    }
  },
}

We can provide helpers to sugar the construction of this recursive target:

def tvm.target.packaged(
  target="cuda",
  executor="aot",
  runtime="crt",
): ...

In the common case, user only need to feed with “cuda”, because we could provide a good default. For advanced use cases, users could use the packaged API to specify their own specification for the package

2 Likes

@Mousius Hello, Where is this work at now ?

@stoa this one stalled out last year in the midst of TVMCon preparation. we’d like to pick it back up now that we’re all back from vacation.

@junrushao based on your last comment, I’m still missing the justification as to why we should stick with a recursive Target. some specific responses:

Can’t the automation look instead at CompilationConfig?

It would be great if you could provide some more illustration here. I think it’s hard to argue this position in the abstract. As a community, we need to make decisions based on the merits we can all observe. Is there a design document you’re intending to propose here that illustrates a situation that would be more difficult in keeping with Target?

I’m not sure I quite see how CompilationConfig changes this aspect. The set of configuration is still bundled together–just not inside something called Target.

I think that part of ensuring a clean design is making conscious decisions about code architecture and layout such that developers feel that paging in each new layer of abstraction is “natural.” That is to say, as the level of detail increases, the concepts build on previously-used concepts at higher levels of abstraction.

CompilationConfig essentially proposes that we organize the user-facing configuration by grouping it according to the logical compiler component which consumes it. This organization allows us to allude to the internal compiler workings using the a user-facing configuration data structure, and allows us to potentially reduce the set of configuration required to unit test a component of the compiler. It also allows engineers to quickly make decisions about where a piece of configuration belongs according to where it’s consumed in the compiler. I would argue that each of these properties allows us to scale the compiler without triggering as many community-wide discussions about config layout.

I think we’ve motivated already that the present Target, while expressive, doesn’t compose well from a user perspective, and that it doesn’t decompose well from an autotvm log perspective. We’re arguing for an improvement in those properties here by illustrating that our alternative using the present Target structure is essentially to define a Target-specific merge() function to compose user-facing Target configs and a Target-specific filtering function to whitelist specific properties in the Target for inclusion in an autotvm log. Both of these tasks are going to significantly increase unit test complexity and load, and if we don’t get those tests right, will equivalently cause user confusion (in the form of “why can’t I specify e.g. this memory layout in a platform configuration file?”).

If my understanding is right, the CompilationConfig will collect all attributes of a module build in a single data structure - this makes sense. It also makes sense to regroup compiler options from PassContext together with the CompilationConfig as well. There may be more:

  • Specific options. For example, the schedule can be chosen differently on the same target depending on whether data are available in cache or tightly-coupled memory vs external memory with low bandwidth or relatively long latency. Same target, different config.
  • User preferences. For example, the user disables data cache for whatever reason, or prefers smaller code/data footprint even if reducing performance, which may require different schedules.

Do you also plan for this kind of “options” be specified via the CompilationConfig ?

@stoa I agree it probably makes sense to move attributes from PassContext if we do this. The tricky bit is that right now, Target (which is what is predating CompilationConfig) is used to key autotvm tuning logs. Given this precedent, it’s reasonable to presume it would continue to key MetaScheduler and AutoTIR logs as well. However, not everything in CompilationConfig probably makes sense to use as an input key there–depending on the extent of the optimization strategy (e.g. autotvm is operator-specific), it probably makes sense to exclude some options (e.g. here we argue for excluding the executor and runtime from autotvm logs).

So before we move to add more things into CompilationConfig I’d like to resolve this issue.

I think it makes sense to include a model of the available memories in CompilationConfig. These would be used to determine where buffers could be placed in memory. I’m not sure we intend to support exactly a “disable data cache” option (this is pretty target-specific), but you could accomplish that by modifying the memory model provided to the compiler. And, target-specific wrappers could be written (similar to tvm.target.Target.micro(model)) to provide a more user-friendly disable_data_cache= option here. Would that accommodate your use case?

Thanks everyone for the discussion so far. We have already got a lot of information about the goals and possible intentions of the design. One thing is pretty clear that the particular choice of data structure does have a decent impact in a few areas.

Before suggesting a concrete choice, I would like us to pop up a level and think about the hidden message about this discussion – How should TVM compilation pipeline, or compilation pipelines(assuming there are many kinds of them) be “configured”.

To help clarify the question I draw the following diagrams. A generic flow in TVM can be roughly summarized in the following picture:

  • We start with an IRModule(modA), possibly already been optimized by user or some previous passes
  • We run a set of transformations passes on modA to get modB
  • We then generate rt.Module from modB, in order to get it running on a specific platform in mind(e.g. a specific board).

We can find that there are roughly three kinds of “config-like” options appearing in this flow and that can affect the final outcome.

  • A0: The specific options used in transformation(e.g. How aggressive we want to inline)
  • A1: The build “constraints” of the platform of interest, this can be the instruction set(x86 or ARM), or runtime constraints(crt, packed-api vs unpacked-api).
  • A2: Within the IRModule itself, there can be additional constraints on existing functions. Imagine that a previous pass/user decided to optimize my_func on nVidia GPU, and have already generated a call to my_func via CUDA runtime API. Then follow up optimizations will need to respect that “constraint”.

To some extent, each of these A are somewhat inter-correlated with each other. For example, if we have a final platform constraint that does not support a vector unit, then it means that we will need to disable vectorization.

Nevertheless there are still two very distinct types of configuration here:

  • C0: In the case of A0: we are mainly interested in “how”, aka procedurally what we do with the program. In many cases, regardless of the transformations(e.g. inlining), the final outcome can run on the platform of interest.
  • C1: In the case of A1 and A2, we are declaring “constraints” imposed by the final platforms of interest(e.g. must have a vector unit, must use unpacked ABI). These constraints information do not dictate “how” we run the optimization, but can provide additional information for certain specializations.

The distinction of the two types are really important here. Coming back to the general goal of TVM. We want to enable composable optimizations of programs. Sometimes this can mean that some previous stages of program transformations are done by another developer, then feed into follow up stages.

C1 type config is something that we usually want to preserve as part of IR or log. For example, BYOC pre-decided that a certain function should be transformed to run on CUDA, then its caller must know that constraint and call using cuda runtime API. Such constraints need to be reflected as part of the IR(aka IRModule) itself, so that followup passes can respect and make use of such information.

C0 type config, however, does not need to appear in the IR(or intermediate data structure of interest). Imagine that we choose to separately inline my_func before handing over the current pipeline. Because the transformation is already done, followup transformations do not need to know this information as the IRModule itself after transformation is already self-contained.

Some of the discussions here started from a single monolithic pipeline, it indeed can be very tempting to consolidate everything into one configuration of interest under that scenario only. I would encourage us to look broadly into the composability perspective of it. Since composability is the key to encourage collaboration without putting too many restrictions about pinning all details about a pipeline. Some of the discussions also touch on this perspective. A very natural consequence of reasoning here is that we need to distinguish C0 and C1 type configurations (Folding pass context config into the whole config as result might go against this principle) in the foundamental level. Of course this does not pre-clude use to create an unified option interface(e.g. at the tvmc level), just at the level of the composational optimizations and things that we will put into the IRModule, we need such separation.

Another related topic is whether or not C1 type configurations can benefit from future dissection/clarification, or if there is enough common ground here to have a consistent branding.

@tqchen thanks for these contextual remarks.

I would like to point out that in targets where we emit something closer to machine code (e.g. edge targets, hexagon, etc), C1-type config can actually inform C0-type config. For example, we may want to run additional passes or modify pass configuration based on the platform chosen in order to apply platform-specific optimizations. So I am really not convinced they are fully separate.

Recording some discussion here between @mousius @manupa-arm @tqchen @junrushao and myself for posterity:

  • A concern with this proposal is that it may cause duplication of configuration on the IRModule. That is to say, if we add CompilationConfig as an IRModule attribute, there is still a need to identify for each function in an IRModule: what sub-Target shall it run under, and what other Targets may it invoke? These questions have bearing on the codegen (e.g. when considering how to implement tir.call_packed_lowered) and on the Executor (when considering the state that may need to be passed down into a function and any additional execution engines which may need to be configured in order to run a function).
  • Meanwhile we still have yet to see a clear motivating example as to why we need a recursive definition of Target. @junrushao and @tqchen could provide some follow-up to this point.
  • There has been some suggestion that autotuning log keys could be defined at a high-level as “C1-type config.” I disagree with this suggestion, as I think it’s likely that both the autotuning method (e.g. AutoTVM, MetaScheduler, AutoTIR) plus the specific runtime/executor config play into this. I think each tuning method is going to need to define a way to cull the Target or CompilationConfig in order to define what goes into a tuning log. If there is agreement on this point, I would like us to focus discussion on this RFC thread around ensuring that whatever data structure we choose here makes it easy to accomplish this culling process independently of considerations of where to place configuration.
  • Finally, this RFC started by proposing an improvement in the user-facing configuration; however, it seems that the part of it which is causing most controversy is that it affects the compiler’s internal configuration state. It may help to have a more focused RFC to collect community feedback around how we should configure the compiler at the IRModule level. Meanwhile, to address the concern of duplicating state above, it would help to see a sketch of how this proposal might suggest we replace Targets at the IRModule and function level. Originally this was left open, but I think it would help to clarify a bit further to understand the impact of this RFC.
1 Like