[pre-RFC] Compilation Configuration Representation

tqchen · April 26, 2022, 1:51pm

Thanks @kparzysz What you said makes sense.

Effectively one point of view calls for a unified structure(base) would be needed to be able to configure through the divide and conquer transition through V0=> V2 => V1 phases of function optimizations. Which in your terminology means “Architecture”. I agree with that pt.

The V2 view mainly calls for a need of ''Architecture", which contains the components and connectivity that can represent :

V0 global set of configurations
V2: some configs that contains host with target
V1: the final leaf terminology where only really a single “target” in traditional compiler sense.

Given “Architecture” describes the relations on how things groups with each other in a hierarchical fashion. One possible option would be to adopt the current Target data structure (perhaps with a different name to differentiate from the leaf component), given the relation groupings usually are sub-trees.

Note that the naming itself is a separate issue that can be addressed independently (Personally I think architecture should be avoided mainly because it is already used in Arch field of LLVM’s target triple, which makes it a sub component of target (triple)), but it is a minor issue.

kparzysz · April 26, 2022, 1:57pm

Yes, definitely. I was trying to present an independent point of view, and so I was trying to avoid using terminology that was already in use in this thread.

areusch · April 27, 2022, 12:13am

Thanks all for these discussions. I agree with @kparzysz’s point that the architecture should be separated from the concept of a “component.” I had a similar thought in discussion with @Mousius last week that perhaps we should to formally name and define these concepts because they are complex and easy to confuse. We’ve had quite a few problems communicating about the overall desired outcome here because it’s difficult to know whether someone means “the conceptual idea of Target” or “the current realization of LeafTarget in the codebase” or “some partially-abstract base class for both architecture and component views.”

I think one thing that’s confusing about the current Target data structure is that the name of the structure is both:

a base class which provides schema and serialization
an abstract concept that vaguely describes the deployment environment

It might be useful to depart from the name Target here, since that seems to just be overloaded and vague at this point. I did have this thought:

LeafTarget → VirtualDevice::codegen (that is, actually require a VirtualDevice in place of LeafTarget, and include a field Codegen codegen which could describe a constraint on how the compiler may generate code for this device). Codegen is really what LeafTarget::kind indicates, and we’ve sanctioned that word via the Bring Your Own Codegen name. Sure, there are other things that are implied by including a codegen into the description of the deploy environment constraints, but ultimately the main thing described within the bounds of the Codegen data structure are properties of the codegen itself. You could construct a VirtualDevice with only a Codegen specified, and then this would lend itself better to the refactor asked for by Artifact where we allow users to name VirtualDevices.

I don’t have great thoughts on the others yet. Half-baked ideas…

PackagedTarget → ? Thought for while here and still not sure. CompositeDeployment or Deployment or DeployEnvironment.
Target/TargetBase → DeployConstraint or TargetSchema or something.

However, the general thing i’m going for here is to tighten the scopes/definitions so that we can make progress here. We can always add new concepts as we build out support for them.

I agree we might be able to reuse the conceptual data structure. In reusing the current Target data structures, the opportunity could arise to introduce ambiguity in the tree:

class HeterogenousDeployEnvironment : public TargetBase {
  // What does "target" in "target_host" mean? 
  // What kind of TargetBase should be filled in here?
  TargetBase target_host;
}

Here we’ve repeated the name “target” a few times and made it unclear how to fill in the data structure. If we are to reuse such an ambiguous structure, I believe that we should avoid ambiguity so it’s clear how we intend for people to use it.

tqchen · April 27, 2022, 2:35pm

Thanks @areusch , to further build on your comment.

The main property that we want to preserve (from the current target system) is a common base class of possible configurations that present V2, and depending on how the dashed box is circled it can range from a singleton (e.g. device only CUDA), a part of the composite (with the most common case being TargetWithHost), and the entirety of V1.

To build on the recommendation that leaf components being separated and give an example under @kparzysz 's terminology (Architecture being the layout and Target being the component – leaving out the naming itself for now.

// No virtual device is needed as compilation for TIR function
// is generally applicable to any virtual device
class DeviceOnlyArch : public Architecture {
  public:
   Target device;
};

class DeviceWithHostArch : public Architecture {
  public:
   Target device;
   Optional[Target] host;
};

// Virtual device needed for graph level runtime information validation 
class PackagedArch : public Architecture {
  public:
   List[VirtualDevice] devices;
   Target host;
   Runtime runtime;
   Executor executor;
};

Note that different architecture itself certainly will result in different compilation pipeline that can be decomposed into some of the sub-architectures – as a result dispatching on the kind or structured view is helpful here.

Depending on the phase of compilations and their state, a function can sit at different level of constraints(Architectures), specifying the deployment constraints(and hints about information) about that function, ranging from PackagedArch to DeviceWithHostArch, then finally DeviceOnlyArch.

In an original view, an Architecture itself can be any meaningfully grouped subtree in the global settings, as a result, the leaf itself can also be viewed as a subtree. That was the original rationale of the Target system and personally I do not find a strong difference between the two. But I also acknowledge the advantage to be able to separate out leafs as them being special. The main thing to preserve is the ability to specify architecture(of subtree) through out our divide and conquer process of compilation.

areusch · April 27, 2022, 2:56pm

Just to be clear about each of these cases–could we explicitly state their uses in the thread so everyone is on the same page? I think there might be questions about why you’d ever pass DeviceOnlyArch to tvm.relay.build().

tqchen · April 27, 2022, 4:10pm

Just to build on the current use case in the UX.

The most common setting we pass to build is DeviceWithHostArch(right now it is tvm.target(“cuda”, host=“llvm”), which hopefully internally get canonicalized to a PackagedArch with good defaults.

In a world where build is modularized and can take any IRModule during an intermediate stage of compilation, we could expect an IRModule that comes with collection of functions already constrained in some way(due to previous passes), each function containing some constraints arch attribute (As DeviceWithHostArch, or DeviceOnlyArch or some other variants).

A build function take these information into account to build the final module. Such IRModule could still contain a PackagedArch attr at the IRModule level assuming that constraint for the global module is consistent with the specific choices derived at the function level.

Again the need of V2 comes from the need to specify a such constraints through divide-and-conquer phases and be able to represent that intermediate state and constraints for future passes.

areusch · April 27, 2022, 6:08pm

I think there is a difference between what is being proposed here (argument to tvm.relay.build) and what is annotated onto an IRModule function. This proposal discusses adding an attr to the top-level IRModule with what’s called Architecture here. I do not believe we have tackled the question of: what should get annotated onto a particular Function.

Your example is of someone providing an IRModule with such annotations–in context of this proposal, we’re just talking about the top-level annotation. Given we are also discussing canonicalization, I think there was an expectation on my side that anything less than PackagedArch passed to tvm.relay.build would be canonicalized before being attached to IRModule, and therefore consumers of IRModule should expect only PackagedArch on the IRModule.

Does that agree with your understanding/is there any such use case you know of that could not annotate PackagedArch? the one I am thinking about is tvm::build, which is used in automation. I think we do need to accommodate that use case here, but it’s not so interesting as a counterpoint or example at this level of detail since it’s filled by the automation infrastructure and we could simply adapt that to follow what made sense based on more pressing design requirements.

tqchen · April 27, 2022, 7:11pm

To answer the specific topic of canonicalization, I think we agreed on canonicalization itself: Narrowing down to the context of relay.build convention I think it is helpful, i.e. relay.build simply canonicalizes and attaches a PackagedArch to the IRModule. That was also what we previously agreed to as well I believe in the PackagedTarget proposal. Note that under broader build context(e.g. an IRModule might only contains TIR function) PackagedArch may or may not make the best sense, however that can be left out for now as the particular PackagedArch attr requirement under the context of relay.build is quite reasonable.

Now on the broader discussion, it might be good to come back to the goals:

G0: Having a struct attached to IRModule
G1: Having a struct attached to Function specifying the build constraints of the function
G2: Ability to refer to such struct through simple tagging e.g. "aws/c4.xlarge" and recording

One of the key thing that we would like to preserve is an ability to enable a struct that covers out V2 needs through out the phases of divide and conquer and such base struct can be used to directly serve G0, G1, and G2.

Of course it is temping to simply focus on G0, which is I believe that leads to some of the reasonings and that what get annotated to functions are less relevant. However, from the overall architecture pov they are relevant in terms of design redudancy and simplicity. This is also considers the fact that previously we already have a design that is currently being used (the target in some recursive form, although not favored by all), that covers three goals(G0, G1, G2) and V2 overall. Introducing two structures effectively means increased complexity, and likely there needs to be a separate mechanism to handle G2.

There are some disagreements on the particular choice of data structure (de-coupling components), which is being addressed in the latest discussions. The latest set of discussions comes comes back to the needs of using Architecture to represent a spectrum of sub-trees per V2(instead of simply V0), which aligns with G0, G1 and G2, which is a positive direction that aligns with the goal.

kparzysz · April 28, 2022, 2:52pm

My idea for this seems a bit different, but maybe the difference is only superficial. Let me present what you stated here, but in the form I imagined, so we can see if our views match.

First, we have some set of components. These represent hardware blocks, and we can think of this set as a database of known processors, accelerators, etc. Let’s say we have

  // I don't know specific names, but "NVIDIA_GPU_type1" could stand for "RTX3080"
  // or something like that.
  Component NVIDIA_GPU_type1;
  Component NVIDIA_GPU_type2;
  Component AMDGPU_type1;
  Component X86_64_type1;

Then, for describing a specific system, we’d create an “architecture”:

  Architecture = {
    Components = [X86_64_type1, NVIDIA_GPU_type1, NVIDIA_GPU_type1];
    // Abbreviate C[x] = Components[x]
    Connections = [(C[0], C[1], "uni-directional"),
                   (C[0], C[2], "uni-directional")]
  }

This would represent an X86 with two GPUs, where the X86 can actively communicate with each GPU, but GPUs cannot actively communicate with anything.

We could then say

class DeviceWithHostArch : public Architecture {
 public:
  int host;  // Components[host] is the host.
};

Making architecture a member of DeviceWithHostArch would probably be better, but the idea is the same.

Seems like the main difference is that you put Target in the derived classes, whereas in my idea, targets (components) would be listed in Architecture. The components list in the architecture could have additional properties, like OS:

Components = [
  (X86_CPU_type1, Linux),
  (NVIDIA_GPU_type1, baremetal),
  (NVIDIA_GPU_type1, baremetal),
]

The idea is to have a set of building blocks, and a way to represent structures that we can build from them in a way that we can add more blocks without having to modify anything else (to enable their use).