[RFC] TVM Target Specification

haichen · June 3, 2020, 9:41pm

I stand with Tianqi on the target_host attribute as it encapsulates the information required to compile for a device and can simplify the transformation passes in the TVM stack. I have a few questions to the new target specification.

How will the generic function and dispatching works with the new target? Based on the target id or keys? What need to specify the keys in the targets?
Composite targets look very flexible, but how passes and generic function treat the composite target needs more discussion.
Previously we can create a context from target. What will this become if the target includes the target_host?

tqchen · June 3, 2020, 10:14pm

some thoughts:

I think they should be based on keys. Ideally, we should not think about generic dispatching but collection of strategies that can be applied. For example, if the keys include [gpu, cuda, tensorcore], then it means we can apply all the strategies registered for these three categories.
I don’t know how would be deal with composite, one way is to decompose into a per target function and then find the strategy.
We should still create the gpu context in this case(as the target host is an attribute for the device)

kparzysz · June 4, 2020, 4:39pm

Another thought is that we should remove “llvm” as a target. Right now target = “llvm” means “cpu”, but it also means “jit”. We should replace it with something that has a clear meaning, and should be independent of whether the LLVM framework is used to generate code for it or not.

haichen · June 4, 2020, 4:47pm

Keys are an important field in the target to make other modules work. Since the target can be created from json, I’m worried if people forget to add certain keys in the target, it might cause some undesired behavior.

tqchen · June 4, 2020, 4:48pm

Right now the jit an cpu does not necessarily conflict with each other, as if the target is local, it can be exported normally as a library, if it is a cross compilation target, then we cannot directly execute, but still is able to export to an library.

So llvm right now means cpu, and jit if it is local. It would be good to hear about other alternatives.

tqchen · June 4, 2020, 4:49pm

I agree with your concern, one thing we could do is to add default set of keys for an target id, when keys are not explicitly present. For example, cuda will always have cuda and gpu attached to its key during creation time.

We cannot automatically add uncommon keys like tensorcore though. But we could create tags like nvidia/gtx2080 that refers to these keys. We could also create some common tags, like nvidia/gpu-with-tensorcore.

kparzysz · June 4, 2020, 6:20pm

The question is “what do we want the target to guarantee?”. If we want “llvm” to include both CPU and JIT, then it should always mean that both features are present. Whether the target is local or not is a feature of the runtime environment and not the compiler. On that note, I think we should just get rid of the JIT, since we can already load/run DSOs.

Another thing is that there are targets that use LLVM (like AMD GPU), but contrary to what intuition may suggest, they are not “llvm”. We should at least rename it to “cpu” or something like that.

tqchen · June 4, 2020, 7:21pm

I think there is still value of JIT to be present, as a lot of our current examples depend on it. Another way to think about it is that llvm itself is a target, and we happened to have a JIT engine locally for that target.

We can discuss the alternatives, for example, introduce an llvmjit target that directly corresponds to the jit for local environment runtime.

You are indeed right that the llvm target is cpu specific, we could consider rename it to llvmcpu. Would love to hear others’ thought about what is the best naming scheme.

kparzysz · June 4, 2020, 10:26pm

This is precisely the point of view that I strongly disagree with. The code that runs is not LLVM IR, it must be compiled to whatever the target triple happens to be. Any LLVM IR that we can generate at some point is not target-agnostic, on the contrary, it’s very much target-specific. That target should be what we consider the “target” here. Calling it “llvm” only serves to obfuscate this.

There is more to this, but let me try to be succint.

target='llvm' means LLVM with -target= whatever the default triple is for the LLVM libraries linked into TVM. It will happen to be the same as host most of the time, but it is more of a coincidence rather than a conscious choice. If this target triple doesn’t agree with the host platform, and we attempt to JIT something, TVM will crash.
If we want to build for Android, we still have to use the llvm target, even though there is no JIT supported. LLVM (as in “compiler framework”) does support JITing on ARM/AArch64, but TVM doesn’t use it (due to Android limitations). So, here there is llvm target that doesn’t come with JIT.

If we propose a target for a specific GPU device, we should also have a specific target for Android CPU instead of the awkward llvm -target=aarch64.... Similarly, for “the CPU of the system we’re running on”, we should use something like host.

kparzysz · June 4, 2020, 10:42pm

Going back to the target_host question. Another argument against is that a specific device can be present in different systems with different host processors. This would necessitate having different targets for the same device, if target_host is a part of the target description.

I don’t think we need to get rid of the target_host right now, but it does create an unnecessary asymmetry in the design.

tqchen · June 4, 2020, 11:29pm

fair pt, how about the llvmjit and llvmcpu proposal?

tqchen · June 4, 2020, 11:31pm

In most cases we do need to generate the host code together with the device code before we are going to run it. One way to resolve this problem is for re-targettable build is to not specify target_host in the program(as they can be optional before split-host-device), and then manually re-specify the host part.

kparzysz · June 5, 2020, 12:28am

I guess that’s ok. Let’s see how it works and we can refine it later if needed.

junrushao · June 17, 2020, 9:01pm

@tqchen Just a minor naming issue:

Which one do you prefer? .add_attr_option or .add_config_option

tqchen · June 17, 2020, 9:02pm

@junrushao how about we list the proposal options and we see what do everyone think? we can do it in this thread or in a separate thread

junrushao · June 17, 2020, 9:10pm

Let’s do it in this thread.

I am working on the target id registry, but was curious about people’s option about one naming: “add_attr_option” vs “add_config_option”.

In the RFC, to configure the schema of a target id, we allow using the syntax below:

TVM_REGISTER_TARGET_ID("llvm")
.add_attr_option<Bool>("system_lib");
.add_attr_option<String>("mtriple");
.add_attr_option<String>("mattr");

This allows users to set 3 attributes of llvm: system_lib, mtriple and mattr.

I was wondering if it is slightly better to use “config” instead of “attr”, i.e. use “add_config_option” instead. The primary reason is that we have been using “attr” too much in the codebase, which makes its meaning vague; but config seems to be more informative in this case.

Would love to hear what you guys think

ANSHUMAN.TRIPATHY · June 24, 2020, 3:24pm

@junrushao: I totally agree with you! The “attr” usage in this case is pretty confusing. I think “config” is better that “attr”.

As these are various user options for target gen. How about “add_user_option” or “add_target_option” ?

comaniac · July 7, 2020, 12:57am

I just noticed two issues in the current target specification:

P1. Whether to sort the attribute values by default or not

We currently use an array to store target attributes (if it has multiple values) and preserve its order when serializing a target to a string. However, it seems unnecessary for most attributes. For example, the following two targets would have different serialized strings:

t1 = tvm.target.create('cuda -libs=cublas,cudnn')
t2 = tvm.target.create('cuda -libs=cudnn,cublas')

To me, these two targets are exactly the same, and we should have a unified string if two targets are functional equivalent.

Another example is the target of Rasp4:

tvm.target.create('llvm ... -mattr=+neon,fp-armv8,thumb-mode')

IUUC, the order of mattr should not be preserved.

Of course, we do have exceptions like keys which order has to be preserved, but keys is already a standalone property of target, so we don’t have to worry about it.

IMHO, it would make more sense to make all attributes values unordered sets by default, and sort them when serialization to guarantee the target string is deterministic. For the ordered attribute values, we may allow developers to specify it as ordered.

P2. Semantic of matching two targets

When applying the history best log or the TopHub fallback log from AutoTVM/Ansor, it first matches the target model (e.g., unknown, 1080ti) and then keys (e.g., cpu, arm_cpu, gpu, cuda). It means we may get a record with target llvm -mcpu=skylake-avx512 when querying records with target llvm -mcpu=core-avx2.

Another follow-up issue is that if we use the target cuda -libs=cudnn and TopHub has a record of conv2d_nchw.cuda for the target workload. In this case, we will match the record on the TopHub and use it to build the model instead of cuDNN. This is not an expected behavior. One solution is also putting a record of conv2d_cudnn to the TopHub so that the op strategy will compare their latencies and select the better one, although this may not be user’s intention, neither.

cc @tqchen @junrushao @haichen

junrushao · July 8, 2020, 8:48pm

Hi Cody, Thank you for bringing this up! It is interesting and extremely meaningful discussion!

I found P1 and P2 are all about structural matching of specific targets and their specific attributes, while P1 focuses on deterministic representation of a certain attribute, and P2 focuses on corrects ways to find matched targets.

JSON representation of a target. As brought up in this RFC, our ultimate goal is to save targets in a JSON-like format. The problem of JSON is that it does not offer the data structure “Set” natively. Therefore, although doable, it is somewhat questionable to me if we really want to make -mattr as an unordered set (or sorted array):

imagine there is a canonization somewhere, when should it happen? in serialization or in deserialization?
which attributes should be sorted (e.g. -libs) and which attributes shouldn’t (e.g. -keys).

Raw string representation of a target. Right now we are not using JSON yet. Targets are still serialized as raw strings. The format that the last PR used is that

we put -keys first, and other attributes are sorted alphabetically (e.g. llvm -keys=... -a=... -b=... -c=...);
the inner order of each attributes, as you already mentioned, is untouched (e.g. the two libs in -libs=cudnn,cublas are not sorted).

P1. Sort or not sort in raw string representation. Our current formatting rule sorts attribute keys, which is slightly better than the previous one, in which nothing is sorted…As P1 proposed, it might be favorable to sort inner order of each attributes too, because

in many cases (e.g. -libs, -mattr) the order doesn’t matter at all.
sorting helps with (but not completely address) the problem of structural equality of targets.

However, if we use sort-them-all policy on all attributes, it does force an unnecessarily incorrect assumption (i.e. order doesn’t matter at all). As proposed in P1, we should:

by default sort them;
make exceptions for those cannot be sorted.

Syntax for sortable/unsortable attribute. I very much agree with all the points, then we should think about the syntax to extensively express it.

S0. Integrate into types

.add_attr_option<Set<String>>("mattr")   # sortable
.add_attr_option<Array<String>>("keys"); # unsortable

S1. Integrate into names

.add_attr_option<Array<String>>("mattr")            # sortable
.add_attr_option<Array<String>>("keys:unsortable"); # unsortable

S2. Add new API

.add_attr_option<Array<String>>("mattr")            # sortable
.add_unsortable_attr_option<Array<String>>("keys"); # unsortable

P2. Target matching. P2 presents several aspects of the matching, which we may summarize as follows:

Order of matching: first match -model, then match -keys
Forbidden keywords: we may get a record of incorrect -mcpu=skylake-avx512 when our real target is -mcpu=core-avx2, which can cause our program to crash because of illegal instruction
Conditionally useful attributes: for example, when we have -libs=cudnn, TopHub doesn’t, then we failed to dispatch to cudnn.

The source of complexity comes from

the target to be matched
previously stored autotvm logs have incomplete knowledge

Given the those complexities, I think it is great if we can

allow user-defined matcher, instead of seeking an all-in-one string-based solution
store logs in a way that full information is contained

comaniac · July 8, 2020, 9:21pm

Thanks for the summary and proposals. They look good to me.

While I personally like both S0 and S2 for specifying if an attribute can be sorted, I would prefer to use the word “ordered” which aligns to C++ convention. For example in S2:

.add_attr_option<Array<String>>("mattr")         # sortable
.add_ordered_attr_option<Array<String>>("keys"); # unsortable

For P2, I also agree with your opinion that the semantic of matching two targets vary by use cases. Providing a default strict matcher that matches everything while allowing users to define their own matcher sounds like a good idea to me.

In addition, I am not sure if storing logs with full target information could solve the issue. The root cause is that even we have stored cuda -libs=cudnn -keys=gpu -model=v100 (in case we tuned conv2d_nchw.cuda with this target), -libs=cudnn is unnecessary for this record. However, I think this is more like a topic about how to improve AutoTVM/Ansor log format.