[pre-RFC] Compilation Configuration Representation

All the the alternatives (A1a, A1b, A1c), should be able to cover the need that we initially bought up – around N3. Additionally, the Target system as it is now is already powerful enough to resolve the N3 related needs that was bought up, as the alternatives @junrushao listed along the A1c direction.

In all cases , it is certainly possible to resolve the problems with extra layers of abstractions and indirections. As a matter of fact, they are all very similar, except for how the data structure itself is built up.

So the main thing that would be helpful here is to understand the tradeoffs here under different contexts, given our previous discussions was focused around N3, it is also helpful to look at things from other needs.

To give some examples:

From N0’s pov, the ability to directly pass in Target with a host field is a good default solutions for this most comon combo, so in the case of API/UX design, we might want to encourage this kind of usage without worrying about additional fields for hetergenous setups in a config.

build(mod, Target("cuda", host="llvm"))

Additionally, the transition between E0 to E1 encourages a transition from Target with host field(that indicates a mixed host program) to a device only(without host).

From N2’s perspective. aws/c5 favors deployment target as a holistic thing(aka at the config level).

build(mod, "aws/c5")

Under the context of config and target, we will need to be able to say that a tag can refers to either a config and Target, which effectively complicates the tagging system and explaination here. Additionally, there will be needs to have a common mechanism to register the tags for both target and config. Making them more uniformed would make this perspective more streamlined.

From the N4’s pov, we will need to be able to represent the objects during decompositions, which means there will be need of smooth transitions of related information at the function level. For example, for some a function that involves mixed target host/device mixing the transitions to a device only. If that entails a difference in terms of the “target constraints”, e.g. for functions with multi-target it starts with a “config” attr, then for functions with a single device it becomes a “target” attr. Such transition is not as uniform.

In the context of N5, there will be a need to be able to log both single device target, or multitarget config as part of the autotuning logs in the same way. From the automation’s pov they are all “target constraints” of a function, or a collection of functions. As in N4, this would favor a single entity that captures the “target constraint” in an uniformed way, or at least a unified serialization mechanism and perhaps repr printing that covers the target involved.

Finally, we need to consider the overall UX perspectives about how to articulate to the user. On one hand we can certainly introduce a lot of concepts to the users in their most complete form. But the best APIs(e.g. keras is a great example) always aim to present to the users its simplest form for most important usecases.

Then we would get to a point where a user would ask “what is the difference between a config of a function that can run on multiple devices and a target of a function that only runs on one device?” While we can certainly come up with an answer. From UX point of view the tag of aws/c4(can indicate a config that involves runtime env) and nvidia/cuda12(indicate a single target) are so similar, to the extent that a user might feel an artifical boundary in here.

Importantly, majority of users do not have to deal with a MultiTarget setting. It is also unlikely that they needs to deal with explicit setting executor or runtime if we have a proper tag or good default. So our most common use case is the setting that contains a TargetWithHost. We want to be able to maximize the ease of use in this setting. Only asking the user to learn about target that comes with a host field, plus the ability to tag is the simplest way to tell the story, without introducing the extra concept of Config.

So the UX story is like a journey :

  • step0, useful for most common usecases: “you can use a target to specify the deployment environment constraint that you have on a single device, and you have the ability to tag the specification”.
  • step1: generalizing the same story for heterogenous usecases, “you can specify a MultiTarget, which is also a target with a specific schema to specify heterogenous execution case, and fine-tune runtime, executor setting, BTW you get the same ability to tag and log them in the same way as step0”

And if an user do not want to bother to hear about the steps. There is a simpler story: "just pick a tag that closely matches the platform of your interest, for example aws/g4:gpu".

This was covered in the original post:

Ah, I understand, if we don’t pass a Target and instead just pass a tag then you have to figure out which one to go for. The approach taken in Command Line Configuration Files is to wrap the Target in the JSON configuration. Which leads me to believe we should default to a Config level tag which is the highest level available? If we add in-between layers then this still holds? For the same reason I wouldn’t want to use Target to define everything, I can see the error in trying to explain a Target tag and a Config tag and leading to confusion as to what a tag is - which I think we’re agreeing on?

This RFC doesn’t aim to address how you use the configuration so much as define the fact the configuration will be there for you to use and rely on. Unified device/target/memory scope planning stands out to me as an RFC which discusses how to correctly annotate a function for a specific use-case and other than providing a consistent view of the world the CompilationConfig does not impact this.

Are you suggesting something as simple as configuration.to_json() or configuration.serialize_targets() which would return the array of JSON represented Target ? Re-using the already defined schema for Target and providing some way to extract it seems to function here?

Thanks @Mousius I am not suggesting a decision on solutions, but just want to broadly discuss the implication, of the engineering solutions. For example, to build on what you said

Which leads me to believe we should default to a Config level tag which is the highest level available? If we add in-between layers then this still holds? For the same reason I wouldn’t want to use Target to define everything, I can see the error in trying to explain a Target tag and a Config tag and leading to confusion as to what a tag is - which I think we’re agreeing on

This would results in two UX concepts. A target tag and config tag, and in the case of system implementations, possible two similar impls.

This RFC doesn’t aim to address how you use the configuration so much as define the fact the configuration will be there for you to use and rely on

I understand the original intention was to scope it as “top-level config”. However, because config itself is a data structure(just like target) that involves “constraint settings throughout compilation”, we naturally would to ask the following questions:

  • Does the top-level config have to remain as in its most general form, e.g. can it be a Union[Target,MultiTarget], as the most common case remains to be TargetWithHost
  • We might have need to propagate some of the multi-target constraint info in the future to function level, at that time point, which data structure to use(if there is both config and target).
  • The consistency of single device function’s target attr and multi-device function’s config attr.

Are you suggesting something as simple as configuration.to_json() or configuration.serialize_targets() which would return the array of JSON represented Target ? Re-using the already defined schema for Target and providing some way to extract it seems to function here?

Right, this would result in two concepts, target and config. Both of which are really similar to each other and both can appear in the same automation logs. We might need to build two set of mechanisms for both if they end up as drastically different data structure without a common base.

Independent from the engineering discussion. It would be useful to come back to the terminology and think about the UX consequence present that to the user. Obviously this is subjective, but worth to think about what can serve as a good story. I tried to search the term “Target” and “compiler target” on internet and here are some common interpretations:

  • Target platform that the output runs on.
  • Target program that we outputs.

“target platform” roughly aligns the first impression that comes up in my mind. A “platform” includes runtime libraries, hardwares, or any env that might affect the transformation behavior. Looking from that angle, “aws/c5” is certainly also a “target platform”. A SoC with multiple chipsets, NPU, CPU can also certainly be viewed as a “target platform”. A set of distributed machines can also be viewed as “target platform”.

So then it goes back to what stories we tell our users(affects developers as well), and whether or not that story aligns with the most common sense impressions.

First kind of story:

  • S0a: A target is specifies the “target platform”, any deployment environment constraints that you are interested.
  • S0b: If you are interested in multiple device settings, you can use a MultiTarget kind that composes up targets together and specifies the deployment env(that involves multiple devices).

Second kind of story:

  • S1: A target is specifies the single device deployment constraints, you will need to compose them up to form a config, that also specifies the runtime and executor of your model.

S0 ties back to the common sense stories with a progression(first start with only target on single device, simple and easily receptive, then generalize by reusing the same concept that aligns). S1 would require more understanding in differentiating the concept, and resolving confusion that why a SoC with multiple chipset is not a “target platform”.

Which leads me to believe we should default to a Config level tag which is the highest level available?

It would remain in the Config form on the IRModule, which means you could have either easily?

Whichever is appropriate for the use-case, having standardised access to that information means you could access whichever is most useful to you. If you want to query the configuration for an appropriate Target and tag a function with it, that’s an implementation detail of another part of the compiler.

Serialising of objects which don’t share a common base is pretty common in many projects, and it’s clear that Configuration encapsulates Target so can call the serialise internally? There’s no need to complicate this by making everything a sub-class of Target. And I believe what @areusch was saying is that we didn’t want anything but Target in the logs as it has no effect? Therefore encapsulating that with some function for creating logs from many pieces of the configuration may be useful?

@areusch and I had long discussion yesterday offline, and he helped me understand the concern from the UX perspective: If we fold executor into target, then it’s more difficult to separate the config coming from two parties, where one party impl the codegen and the other impl the executor.

On the other hand, my concern is the fragmentation of APIs. It has been a huge problem in the recent 1-2 years, and we do have alternatives not to do so.

Here is my proposal:

  • Part 1. Add Exector/Runtime fields to TargetNode:
class TargetNode {
  ...
  Executor executor;
  Runtime runtime;
};

class Executor {
  FromJSON();
  AsJSON();
};

class Runtime {
  FromJSON();
  AsJSON();
};
  • Part 2. Add a helper API to merge Target, Executor and Runtime
Target MergeTarget(Target target_without_executor_runtime, Executor executor, Runtime runtime);
  • Part 3. Allow separate specification of target, target_host, executor, runtime in TVMC, and internally use the proposed API in Part 2 to merge, validate and normalize them into a single Target object
tvmc --target "llvm" --executor "..." --runtime "..."
  • Part 4. For heterogeneous case, annotate the target onto each PrimFunc/RelayFunc to specify the target/runtime/executor
@tvm.script.ir_module
class Module:

   @T.func
   def tir_func():
     T.func_attrs({"target": JSON-Repr-of-Target-Obj}) # with runtime&executor included
     ...

   @R.func
   def relay_func():
     T.func_attrs({"target": JSON-Repr-of-Target-Obj}) # with runtime&executor included
     ...

Could you elaborate on this? I believe this isn’t solely a UX issue but also a hygiene factor within the compiler and how we represent the data structures internally so would rather not overload Target with Executor and Runtime. This RFC is proposing a suitable home for information that’s relevant across the compilation given we now have at least Executor and Runtime to include, but a side effect is bringing the tvmc view back into alignment with the internals of the compiler.

It’s also worth noting, that with the current RFC for Migrating Target Attributes to IRModule tvmc can glue this together with the relevant pieces, so from a user point of view they wouldn’t know how disparate the internals are but it would be a headache to maintain.

1 Like

Wow lots more discussion here! Thanks @junrushao for writing up our discussions. So one thing I’d like to point out is that the recursive Target approach is not more expressive than the approach proposed by this original RFC. Expressing a “contains” relation can be done equivalently well by

  • defining a recursion relationship inside the Target data structure
  • defining another structure which describes the contains relationship (akin to a join table in database theory)

The main reason I am interested in the join-table approach here is that it vastly simplifies MergeTarget as described by Junru above. And, I’d like to point out that it’s not sufficient here to merely define a function which hides the complexity under the covers. Users need to be able to understand what this function is doing because they are writing the inputs (though we are providing a tag, Command Line Configuration Files contemplates an expansion of the role of tagging to include tagging a partial configuration, as discussed earlier. I’m not sure it will be generally simple to explain how MergeTarget works as Target grows if we adopt the general approach of trying to attach every piece of compiler config to some Target which “owns” it.

The drawback of the flat configuration structure is it could be more difficult to consume inside the compiler. We should discuss whether this is truly an issue and how to mitigate it.

Finally, while I do think it’s important to arrive at an an expressive, understandable Target data structure, as the compiler grows more complex, I think there is a tension between a Target structure which is clear to the user and a Target structure which naturally reflects the organization of the compiler (and therefore has the nice properties of clearly delineating where config should live and being easy to route in the compiler top-level). Hopefully, the organization of the compiler is also such that it’s logical to a power user interested in creating a complex config. However, here I think that UX sugar can help to compose the common target patterns such as “cuda” (which really means 1 CUDA device with an implied “llvm” host). We already do this today anyway, so I suspect it will continue to play a role in the future.

@Mousius I totally agree to make things hygiene, and believe folding things into Target is the correct and consistent approach.

First of all, the automation system solely relies on the target object to understand the code dispatching, hardware specs and runtime information. Without having the information in the Target object, the automation system won’t be aware of the full picture. For example, if we switch executor from VM to TensorRT, the performance can be much different, and so if executor is not inside Target, then the automation system will be confused and learn a wrong objective.

Second, as the direction we are moving towards, the Target object is guiding our IRModule-to-IRModule transformation in lowering, and IRModule-to-Module in compilation. Wrapping with an extra layer seems to architecturally change our compilation pipeline, while alternatives do exist and both seem to be equivalently expressive.

Third, the practice folding all compilation-related information has been adopted consistently in TVM. For example, we may specify the libraries dispatched to via cuda --libs=cudnn. Similarly in LLVM, the target triple is designed in a consistent way, where we could specify libc and other environments.

Historically, fragmentation accumulates in TVM across layers. For example, we have different scheduling and auto scheduling system, slightly-but-not-identical and error-prone APIs for different executors, compilation workflow between relay, relay byoc and tvm, etc. Adding new top-level user-facing data structures, if alternative exists with the same expressiveness and UX, then I would say it would probably lead to more user confusion.

On the other hand, I totally agree and am aware that a graph-level compile involves the interaction of multiple parts, including device, host, runtime and executor. The main concern from me here is that we already have Target as a canonical spec formally, which is already able to express this structure without hurting UX.

What about we define a new target kind:

{
  "kind": "packaged", # probably need a better name, please propose new ones
  "runtime": "crt",   # the "runtime" in the proposal
  "executor": {       # the codegen target for relay function
                      # i.e. the "executor" in the proposal
    "kind": "vm/aot",
    ...
  },
  "target": {
    "kind": "cuda",   # the target that TIR generates to
    "host": {
      "kind": "llvm", # the codegen target for the host-side driver code
       ...
    }
  },
}

We can provide helpers to sugar the construction of this recursive target:

def tvm.target.packaged(
  target="cuda",
  executor="aot",
  runtime="crt",
): ...

In the common case, user only need to feed with “cuda”, because we could provide a good default. For advanced use cases, users could use the packaged API to specify their own specification for the package

2 Likes

@Mousius Hello, Where is this work at now ?

@stoa this one stalled out last year in the midst of TVMCon preparation. we’d like to pick it back up now that we’re all back from vacation.

@junrushao based on your last comment, I’m still missing the justification as to why we should stick with a recursive Target. some specific responses:

Can’t the automation look instead at CompilationConfig?

It would be great if you could provide some more illustration here. I think it’s hard to argue this position in the abstract. As a community, we need to make decisions based on the merits we can all observe. Is there a design document you’re intending to propose here that illustrates a situation that would be more difficult in keeping with Target?

I’m not sure I quite see how CompilationConfig changes this aspect. The set of configuration is still bundled together–just not inside something called Target.

I think that part of ensuring a clean design is making conscious decisions about code architecture and layout such that developers feel that paging in each new layer of abstraction is “natural.” That is to say, as the level of detail increases, the concepts build on previously-used concepts at higher levels of abstraction.

CompilationConfig essentially proposes that we organize the user-facing configuration by grouping it according to the logical compiler component which consumes it. This organization allows us to allude to the internal compiler workings using the a user-facing configuration data structure, and allows us to potentially reduce the set of configuration required to unit test a component of the compiler. It also allows engineers to quickly make decisions about where a piece of configuration belongs according to where it’s consumed in the compiler. I would argue that each of these properties allows us to scale the compiler without triggering as many community-wide discussions about config layout.

I think we’ve motivated already that the present Target, while expressive, doesn’t compose well from a user perspective, and that it doesn’t decompose well from an autotvm log perspective. We’re arguing for an improvement in those properties here by illustrating that our alternative using the present Target structure is essentially to define a Target-specific merge() function to compose user-facing Target configs and a Target-specific filtering function to whitelist specific properties in the Target for inclusion in an autotvm log. Both of these tasks are going to significantly increase unit test complexity and load, and if we don’t get those tests right, will equivalently cause user confusion (in the form of “why can’t I specify e.g. this memory layout in a platform configuration file?”).

If my understanding is right, the CompilationConfig will collect all attributes of a module build in a single data structure - this makes sense. It also makes sense to regroup compiler options from PassContext together with the CompilationConfig as well. There may be more:

  • Specific options. For example, the schedule can be chosen differently on the same target depending on whether data are available in cache or tightly-coupled memory vs external memory with low bandwidth or relatively long latency. Same target, different config.
  • User preferences. For example, the user disables data cache for whatever reason, or prefers smaller code/data footprint even if reducing performance, which may require different schedules.

Do you also plan for this kind of “options” be specified via the CompilationConfig ?

@stoa I agree it probably makes sense to move attributes from PassContext if we do this. The tricky bit is that right now, Target (which is what is predating CompilationConfig) is used to key autotvm tuning logs. Given this precedent, it’s reasonable to presume it would continue to key MetaScheduler and AutoTIR logs as well. However, not everything in CompilationConfig probably makes sense to use as an input key there–depending on the extent of the optimization strategy (e.g. autotvm is operator-specific), it probably makes sense to exclude some options (e.g. here we argue for excluding the executor and runtime from autotvm logs).

So before we move to add more things into CompilationConfig I’d like to resolve this issue.

I think it makes sense to include a model of the available memories in CompilationConfig. These would be used to determine where buffers could be placed in memory. I’m not sure we intend to support exactly a “disable data cache” option (this is pretty target-specific), but you could accomplish that by modifying the memory model provided to the compiler. And, target-specific wrappers could be written (similar to tvm.target.Target.micro(model)) to provide a more user-friendly disable_data_cache= option here. Would that accommodate your use case?

Thanks everyone for the discussion so far. We have already got a lot of information about the goals and possible intentions of the design. One thing is pretty clear that the particular choice of data structure does have a decent impact in a few areas.

Before suggesting a concrete choice, I would like us to pop up a level and think about the hidden message about this discussion – How should TVM compilation pipeline, or compilation pipelines(assuming there are many kinds of them) be “configured”.

To help clarify the question I draw the following diagrams. A generic flow in TVM can be roughly summarized in the following picture:

  • We start with an IRModule(modA), possibly already been optimized by user or some previous passes
  • We run a set of transformations passes on modA to get modB
  • We then generate rt.Module from modB, in order to get it running on a specific platform in mind(e.g. a specific board).

We can find that there are roughly three kinds of “config-like” options appearing in this flow and that can affect the final outcome.

  • A0: The specific options used in transformation(e.g. How aggressive we want to inline)
  • A1: The build “constraints” of the platform of interest, this can be the instruction set(x86 or ARM), or runtime constraints(crt, packed-api vs unpacked-api).
  • A2: Within the IRModule itself, there can be additional constraints on existing functions. Imagine that a previous pass/user decided to optimize my_func on nVidia GPU, and have already generated a call to my_func via CUDA runtime API. Then follow up optimizations will need to respect that “constraint”.

To some extent, each of these A are somewhat inter-correlated with each other. For example, if we have a final platform constraint that does not support a vector unit, then it means that we will need to disable vectorization.

Nevertheless there are still two very distinct types of configuration here:

  • C0: In the case of A0: we are mainly interested in “how”, aka procedurally what we do with the program. In many cases, regardless of the transformations(e.g. inlining), the final outcome can run on the platform of interest.
  • C1: In the case of A1 and A2, we are declaring “constraints” imposed by the final platforms of interest(e.g. must have a vector unit, must use unpacked ABI). These constraints information do not dictate “how” we run the optimization, but can provide additional information for certain specializations.

The distinction of the two types are really important here. Coming back to the general goal of TVM. We want to enable composable optimizations of programs. Sometimes this can mean that some previous stages of program transformations are done by another developer, then feed into follow up stages.

C1 type config is something that we usually want to preserve as part of IR or log. For example, BYOC pre-decided that a certain function should be transformed to run on CUDA, then its caller must know that constraint and call using cuda runtime API. Such constraints need to be reflected as part of the IR(aka IRModule) itself, so that followup passes can respect and make use of such information.

C0 type config, however, does not need to appear in the IR(or intermediate data structure of interest). Imagine that we choose to separately inline my_func before handing over the current pipeline. Because the transformation is already done, followup transformations do not need to know this information as the IRModule itself after transformation is already self-contained.

Some of the discussions here started from a single monolithic pipeline, it indeed can be very tempting to consolidate everything into one configuration of interest under that scenario only. I would encourage us to look broadly into the composability perspective of it. Since composability is the key to encourage collaboration without putting too many restrictions about pinning all details about a pipeline. Some of the discussions also touch on this perspective. A very natural consequence of reasoning here is that we need to distinguish C0 and C1 type configurations (Folding pass context config into the whole config as result might go against this principle) in the foundamental level. Of course this does not pre-clude use to create an unified option interface(e.g. at the tvmc level), just at the level of the composational optimizations and things that we will put into the IRModule, we need such separation.

Another related topic is whether or not C1 type configurations can benefit from future dissection/clarification, or if there is enough common ground here to have a consistent branding.

@tqchen thanks for these contextual remarks.

I would like to point out that in targets where we emit something closer to machine code (e.g. edge targets, hexagon, etc), C1-type config can actually inform C0-type config. For example, we may want to run additional passes or modify pass configuration based on the platform chosen in order to apply platform-specific optimizations. So I am really not convinced they are fully separate.

Recording some discussion here between @mousius @manupa-arm @tqchen @junrushao and myself for posterity:

  • A concern with this proposal is that it may cause duplication of configuration on the IRModule. That is to say, if we add CompilationConfig as an IRModule attribute, there is still a need to identify for each function in an IRModule: what sub-Target shall it run under, and what other Targets may it invoke? These questions have bearing on the codegen (e.g. when considering how to implement tir.call_packed_lowered) and on the Executor (when considering the state that may need to be passed down into a function and any additional execution engines which may need to be configured in order to run a function).
  • Meanwhile we still have yet to see a clear motivating example as to why we need a recursive definition of Target. @junrushao and @tqchen could provide some follow-up to this point.
  • There has been some suggestion that autotuning log keys could be defined at a high-level as “C1-type config.” I disagree with this suggestion, as I think it’s likely that both the autotuning method (e.g. AutoTVM, MetaScheduler, AutoTIR) plus the specific runtime/executor config play into this. I think each tuning method is going to need to define a way to cull the Target or CompilationConfig in order to define what goes into a tuning log. If there is agreement on this point, I would like us to focus discussion on this RFC thread around ensuring that whatever data structure we choose here makes it easy to accomplish this culling process independently of considerations of where to place configuration.
  • Finally, this RFC started by proposing an improvement in the user-facing configuration; however, it seems that the part of it which is causing most controversy is that it affects the compiler’s internal configuration state. It may help to have a more focused RFC to collect community feedback around how we should configure the compiler at the IRModule level. Meanwhile, to address the concern of duplicating state above, it would help to see a sketch of how this proposal might suggest we replace Targets at the IRModule and function level. Originally this was left open, but I think it would help to clarify a bit further to understand the impact of this RFC.
1 Like

You are right that C1 style config can inform the pipeline choices of C0-type config, but not necessarily the other way around(as being covered in the discussion). No so much the other way around. This is indeed not a clear cut, but useful enough to think about such a separation.

Just to clarify, one of the main motivations for this is the tvmc argument --config which can be directly translated into the CompilationConfig; however, the structural improvements made using the configuration illustrate how this provides improvements through-out the TVM stack, I didn’t mean to encourage the notion that only the tvmc flow was considered when presenting this RFC.

In TVM today Target annotations are attached to BaseFunc as part of the main compilation flow, as this RFC does not aim to replace this mechanism it would result in an flow such as:

Bare in mind, the only change that has been made is the source of truth for the information is now gathered into the CompilationConfig and provided as part of the IRModule; everything else exists in the TVM compiler today.

I would challenge the conclusion that the distinction is important, to a user the differentiation and different placement of information generally leads to confusion. I’m also unsure where we’re removing composition by allowing users to take an IRModule complete with it’s configuration and transfer that? This seems like an overall improvement in composability to me rather than the current side-loaded configuration in PassContext which has no real structure today. What this leads me to think is that we should introduce CompilationConfig and use it as a mechanism to force better design choices in the way we handle options that can be transferred as part of an IRModule and better aid composability in TVM.

Compositionality is a fundamental philosophy so please allow me to elaborate a bit more here. One great example to show its importance is the design of deep learning frameworks.

In a deep learning framework. The concept of layers is composable. I can take a residual layer, compose it with softmax loss function and optimizer loss. These layers are abstracted under a common interface nn.module. Each Module transforms an object of interest – Tensor.

Tensor itself can be viewed as containing certain C1-type constraint information. Such as the shape, the data content, the device it resides in. Some of the layers (e.g. CuDNNLayer) may only work under the constraint of a gpu device.

Importantly, there are also C0-type configurations. For example, the number of hidden neurons or stages of residual connections. These information are not part of Tensor, because it is sufficient to apply those transformations and Tensor itself contains minimum but sufficient information for followup layers to apply further transformations. Applying more information to the Tensor could create more constraints, and possibly confusion about how to handle those attributes.

Deep learning frameworks are maximally composable; we can compose a residual block with a classification loss(softmax) or detection loss to form different models. These layers can be developed by different developers.

In summary, composability is obtained by decoupling information and clarifying the minimum but necessary information in the key data structure of interest.

Come back to the case of TVM. IRModule is a bit like Tensor, and we are talking about putting all the configurations of the layers, as well as the device information into a centralized place. If we are only looking at one official model(say resnet-256), this of course can help clarify all the options available, but would restrict the evolution to a single pipeline(and forcing all the options into that one place). A deep learning framework approach would be to only keep minimum information(C1-type) in the key data structure, allowing C0-type separately. For specific applications, there might be a centralized configuration(e.g. argparse) which informs C0 type config, but that centralized config is not part of the Tensor.

In summary, putting all the configurations(C0 and C1 kinds) into a single place will certainly improve clarity if we only have a single pipeline in mind. The C0-type configuration brings un-necessary and sometimes confusing information. Remember that pass writers generally need to take the constraints in IRModule seriously, having C0-type information in IRModule would make developers wonder whether or not they should be considered (which is an open set as we grow the set of passes).

In the spirit of minimum but sufficient principle, we want to limit the information attached to IRModule to C1-type information. Note that this does not preclude that a high-level single pipeline builds a centralized config which then propagates to the lower-level mechanism. I believe that was the original intention, and the main reason I brought up compositionality is that at the level of IRModule and pass we will need to consider such separately carefully.

Since the topic of compositionality is quite important. Let us also study a few more examples:

Example 0: Imagine that we want to try out the following thing. stage0: different kinds of unroll factors or vectorization factors, benchmark, do them in a loop then compare the result; stage1: send to the followup lowering optimizations with another set of configs. In this case C0-type config(e.g. unrolling-factor) in stage0 is not relevant to stage1. Given the collection of different choices of stage0 in a loop, it is also not entirely desirable or possible to centralize the configs into a single data structure for this case.

Example 1: Imagine that people want to build alternative compilation pipelines with a different set of configs(e.g. Running through quantization then building). In this case it may not be desirable to couple the two types of config together, since each pipeline may only care about one set.

We can find that most of the examples come from alternative optimization pipelines and choices that may be different from the current build pipeline. These are however, important cases to support so they either can be incorporated into future pipelines, or simply enable more preprocessing choices that composes with the build pipeline.

Compositionality does not imply that we need to ask everyone to use the same config for all possible pipelines . Instead, the main implication is to clarify what is minimum but necessary (and needs to be considered by all passes/pipelines), while leaving out other parts, so it leaves space of flexibility to others.

Coming back to the particular topic of this RFC. I think we acknowledge that it could be useful to have a centralized config for a single tvmc pipeline which can help to bring the clarity. We also agree that the discussion is not about changing the set of information, but mainly about how we organize the infomation. The main point of compositionality is to carefully dissect the two kinds of configurations when it comes to putting the information in IRModule and how do the two kinds of configurations interact with passes.

1 Like

Let me try to summarize the conversation as I understand it – Please feel to correct me if its wrong.

It mainly boils down to the following point :

What should be attached to an IRModule and what shouldn’t ? According to @tqchen’s description above, it should be C1-style “constraints” and not C0-style “how”. The argument being, C0-styled information are a configuration for passes and not broadly applicable to all transform passes, thus confuses the pass implementation with the choice of what to do with them.

According to the definition of C0 and C1, the above information should be classified as C1. Therefore, are we all agreeing to the fact that is reasonable to be attached to the IRModule ?

As a first step, if we all agree it would be great to unblock ourselves to use a C1-styled CompilationConfig attached to IRModule to proceed in short/medium term. @tqchen @Mousius @areusch – An explicit reponse to this question is highly appreciated :slight_smile:

Now coming back, in today’s state of TVM, C0-style broadly refers to PassContext – Im not aware of anything else. Therefore, the current point presented argues against putting C0-styled PassContext either as a) IRModule attribute or b) part of C1-styled CompilationConfig that is already agreed to be attached to IRModule.


Then, for future work, we should think about “necessity” of keeping the C0-styled PassContext as a side channel (or infact a singleton). IMHO, this contradicts slightly with what @jroesch proposal of integrating the whole compilation pipeline as IRModule → IRModule transformations by committing ourselves to maintain a side-channel driven from the need of the separation of C0-styled vs C1-styled information.

Therefore, it would be great to explore options how to attach/package all possible information that “might” (C0+C1) – not just the “minimum”(C1) – be required from all passes. We thought this could be done by attaching to the IRModule – so that we could export without requiring any other side-channels. However, we are open to hear alternatives.