[RFC] TVM Target Specification

I just noticed two issues in the current target specification:

P1. Whether to sort the attribute values by default or not

We currently use an array to store target attributes (if it has multiple values) and preserve its order when serializing a target to a string. However, it seems unnecessary for most attributes. For example, the following two targets would have different serialized strings:

t1 = tvm.target.create('cuda -libs=cublas,cudnn')
t2 = tvm.target.create('cuda -libs=cudnn,cublas')

To me, these two targets are exactly the same, and we should have a unified string if two targets are functional equivalent.

Another example is the target of Rasp4:

tvm.target.create('llvm ... -mattr=+neon,fp-armv8,thumb-mode')

IUUC, the order of mattr should not be preserved.

Of course, we do have exceptions like keys which order has to be preserved, but keys is already a standalone property of target, so we don’t have to worry about it.

IMHO, it would make more sense to make all attributes values unordered sets by default, and sort them when serialization to guarantee the target string is deterministic. For the ordered attribute values, we may allow developers to specify it as ordered.

P2. Semantic of matching two targets

When applying the history best log or the TopHub fallback log from AutoTVM/Ansor, it first matches the target model (e.g., unknown, 1080ti) and then keys (e.g., cpu, arm_cpu, gpu, cuda). It means we may get a record with target llvm -mcpu=skylake-avx512 when querying records with target llvm -mcpu=core-avx2.

Another follow-up issue is that if we use the target cuda -libs=cudnn and TopHub has a record of conv2d_nchw.cuda for the target workload. In this case, we will match the record on the TopHub and use it to build the model instead of cuDNN. This is not an expected behavior. One solution is also putting a record of conv2d_cudnn to the TopHub so that the op strategy will compare their latencies and select the better one, although this may not be user’s intention, neither.

cc @tqchen @junrushao1994 @haichen

1 Like

Hi Cody, Thank you for bringing this up! It is interesting and extremely meaningful discussion!

I found P1 and P2 are all about structural matching of specific targets and their specific attributes, while P1 focuses on deterministic representation of a certain attribute, and P2 focuses on corrects ways to find matched targets.

JSON representation of a target. As brought up in this RFC, our ultimate goal is to save targets in a JSON-like format. The problem of JSON is that it does not offer the data structure “Set” natively. Therefore, although doable, it is somewhat questionable to me if we really want to make -mattr as an unordered set (or sorted array):

  1. imagine there is a canonization somewhere, when should it happen? in serialization or in deserialization?
  2. which attributes should be sorted (e.g. -libs) and which attributes shouldn’t (e.g. -keys).

Raw string representation of a target. Right now we are not using JSON yet. Targets are still serialized as raw strings. The format that the last PR used is that

  1. we put -keys first, and other attributes are sorted alphabetically (e.g. llvm -keys=... -a=... -b=... -c=...);
  2. the inner order of each attributes, as you already mentioned, is untouched (e.g. the two libs in -libs=cudnn,cublas are not sorted).

P1. Sort or not sort in raw string representation. Our current formatting rule sorts attribute keys, which is slightly better than the previous one, in which nothing is sorted…As P1 proposed, it might be favorable to sort inner order of each attributes too, because

  1. in many cases (e.g. -libs, -mattr) the order doesn’t matter at all.
  2. sorting helps with (but not completely address) the problem of structural equality of targets.

However, if we use sort-them-all policy on all attributes, it does force an unnecessarily incorrect assumption (i.e. order doesn’t matter at all). As proposed in P1, we should:

  1. by default sort them;
  2. make exceptions for those cannot be sorted.

Syntax for sortable/unsortable attribute. I very much agree with all the points, then we should think about the syntax to extensively express it.

  • S0. Integrate into types
.add_attr_option<Set<String>>("mattr")   # sortable
.add_attr_option<Array<String>>("keys"); # unsortable
  • S1. Integrate into names
.add_attr_option<Array<String>>("mattr")            # sortable
.add_attr_option<Array<String>>("keys:unsortable"); # unsortable
  • S2. Add new API
.add_attr_option<Array<String>>("mattr")            # sortable
.add_unsortable_attr_option<Array<String>>("keys"); # unsortable

P2. Target matching. P2 presents several aspects of the matching, which we may summarize as follows:

  1. Order of matching: first match -model, then match -keys
  2. Forbidden keywords: we may get a record of incorrect -mcpu=skylake-avx512 when our real target is -mcpu=core-avx2, which can cause our program to crash because of illegal instruction
  3. Conditionally useful attributes: for example, when we have -libs=cudnn, TopHub doesn’t, then we failed to dispatch to cudnn.

The source of complexity comes from

  1. the target to be matched
  2. previously stored autotvm logs have incomplete knowledge

Given the those complexities, I think it is great if we can

  1. allow user-defined matcher, instead of seeking an all-in-one string-based solution
  2. store logs in a way that full information is contained

Thanks for the summary and proposals. They look good to me.

While I personally like both S0 and S2 for specifying if an attribute can be sorted, I would prefer to use the word “ordered” which aligns to C++ convention. For example in S2:

.add_attr_option<Array<String>>("mattr")         # sortable
.add_ordered_attr_option<Array<String>>("keys"); # unsortable

For P2, I also agree with your opinion that the semantic of matching two targets vary by use cases. Providing a default strict matcher that matches everything while allowing users to define their own matcher sounds like a good idea to me.

In addition, I am not sure if storing logs with full target information could solve the issue. The root cause is that even we have stored cuda -libs=cudnn -keys=gpu -model=v100 (in case we tuned conv2d_nchw.cuda with this target), -libs=cudnn is unnecessary for this record. However, I think this is more like a topic about how to improve AutoTVM/Ansor log format.

Actually, there is a reason not to use different function names. At the moment add_attr_option can take a single element (string, integer, etc.) or a collection of strings. If we invent add_ordered_attr_option, we restrict the use of it only to collections of elements, since it wouldn’t make sense for individual values.

A more unified approach is the one where the type of collection dictates whether it’s ordered or unordered, that is S0 in @junrushao1994’s post. This way we would use add_attr_option in every case, and only vary the value types.

1 Like

@kparzysz @comaniac Thank you for the reply!

I agree with S0 might be a good choice, but it is good to discuss about S1 too. It is proposed in S1 that we have “meta attribute” of an attribute, describing if an attribute should be sorted or not. The major advantage is extensibility: if we need n post-process passes on a specific attribute, then we just write them down like attr_name:sortable,A,B,C, and don’t have to worry about extending types.

Would love to hear about your thoughts :slight_smile:

If we have the sortable flag, it needs to be stored somewhere, so that when the target is converted to a string, this flag will be emitted. This way the target can be fully serialized into a string and then regenerated from that string.

I think it’s a valid approach. Maybe I’d suggest names like ordered or unordered instead of sortable, but that’s it.

2 Likes

Some recent thoughts on the target spec. The choice of id seems to be indeed confusing as it indicates an unique value, while the target itself cannot be identified by the id. Summarizing the choices suggested in this space. I suggest that we change the term id to kind. Would love to hear from everyone’s thoughts

1 Like

During the development, I found as well id is a really vague term and doesn’t deliver too much information what it is referring to. I agree with @tqchen that we should use kind instead.

I agree that id is a bit confusing, but kind sounds weird to me as well. My understanding is that id indicates the backend to be used (e.g., llvm, cuda); while keys indicates the codegen (e.g., arm_cpu, cpu, gpu). Based on this mechanism, we need a combination of id and keys to identify a target. IMHO, if we are going to change the name of id, it would be better to also consider keys so that their combination could be more reasonable.

Another thought, from a user’s perspective, is to have a literally id to replace the current id and keys. For example, id=x86 implies llvm keys=cpu; id=arm_cpu implies llvm -keys=arm_cpu,cpu. It seems also fine because we only have one case (please correct me if I missed others) that uses keys to dispatch codegen (i.e., x86 and ARM).

Please note that the above proposal is different from the concept of “alias”. An “alias” could be at a higher level to locate a more specific target. For example, rasp4b could be an alias of arm_cpu -mtriple=armv8a-linux-gnueabihf -mattr=+neon,fp-armv8,thumb-mode -mfloat-abi=hard.

Note that the id+ keys does not necessary uniquely identifies the target, as we can have other values in the target fields.

The tag uniquely identifies the target, and serves the purpose you talk about. e.g. tag=x86-cpu can imply the additional options

Thanks for the clarification, but the tag in kind=llvm, keys=cpu, tag=x86-cpu looks redundant to me. On the other hand, the keys in kind=cuda, keys=gpu, tag=v100 seems also redundant.

Here are some examples I’m referring to:

  • alias=aws/c5: id=x86-cpu -mcpu=skylake-avx512.
  • alias=arm/rasp4b: id=arm-cpu -mtriple=armv8a-linux-gnueabihf -mattr=+neon,fp-armv8,thumb-mode -mfloat-abi=hard.
  • alias=aws/p3: id=nvidia-gpu -tag=v100.

As can be seen, although id cannot identify a specific target like you pointed out, it can locate to a kind of targets. To me, the semantic should be like “all targets with the same ID share the same build pipeline”. It also means that for example, a model built with target arm-cpu (without other attributes) should be capable of being executed by all ARM CPUs.

Not sure if the above expectation makes sense to you. It would be great if you could elaborate more with the 3 examples otherwise.

In all of these cases, tag refers to the alias.

In the case of the kind semantics, the same logic applies, "all targets with the same kind share the same build pipeline logic(that can be parameterized by the target fields)”

So you are saying they should be just:

  • Example 1
    • kind=llvm
    • keys=cpu
    • mcpu=skylake-avx512
    • tag=aws/c5
  • Example2
    • kind=llvm
    • keys=arm_cpu,cpu
    • mtriple=armv8a-linux-gnueabihf -mattr=+neon,fp-armv8,thumb-mode -mfloat-abi=hard
    • tag=arm/rasp4b
  • Example 3
    • kind=cuda
    • keys=gpu
    • tag=aws/p3,nvidia/v100 (I’m assuming you can have multiple tags?)

This makes more sense to me in terms of the tag and alias. Per your statement about kind also implies the same build pipeline, I think this statement holds by treating x86 and ARM as the same pipeline, which I personally don’t fully commit. However, I’m fine with this statement if that’s the majority option.

arm and x86 shares the same pipeline. note that the pipeline themselves can still be parameterized by the attributes of the target(e.g. whether it is x86 or ARM), but it is harder to make a major structural change

I’m interested to understand how would the JSON would look like for case 3 (above). Also, could you expand a little on how you see this working, specifically who would oversee the graph partitioning process (and call the expected passes in the expected order) ?

For completeness, @comaniac posted a follow-up RFC to cover Composite Targets -> [RFC] Composite Target

2 Likes

I would love to follow up and add a preprocessing stage to the TVM target creation. It is useful for users/developers to clearly know the complete target info at the earliest stage.

For example, the LLVM NVPTX backend adds -mtriple=nvptx64-nvidia-cuda, and -mcpu=sm_xx if missing. We want to move those information at the beginning stage of target creation to make sure the information is clearly presented to developers.

1 Like

For the sake of record keeping, @zxybazh recently submitted this PR [Target] Add target host field for target specification by zxybazh · Pull Request #7462 · apache/tvm · GitHub that implements parts of what is agreed in this thread. The PR is not merged as I write this, but I think it will be very soon.

During this discussion, @areusch touched on a very interesting topic regarding the API design for Target: [Target] Add target host field for target specification by zxybazh · Pull Request #7462 · apache/tvm · GitHub

In TVMC, which is user facing, we recently updated the syntax to allow multiple target specification using something like --target="ethos-n77, llvm mattr=+neon" or --target="acl, llvm mattr=+neon" or even --target="ethos-n77, acl, llvm mattr=+neon", which represents the sorted order of targets in which we want to offload the network.

There is some more discussion here, in case you are interested: BYOC Partitioning Support on TVMC

I think this is a very relevant discussion, considering we can use all the discussion in this thread to define the specification for the target, and then evolve into a more natural API for the user to care about, that we’ll then convert to the internal API specification the TVM care about.

I would be very interested in hearing what others think about where is the line between internal target representation and potential improvements for the user facing API.

1 Like

A bit more follow up thoughts on backward compatibility about the PR https://github.com/apache/tvm/pull/7462 supporting target decalreation with target host like target("cuda", "llvm") or target('{"kind":"cuda", "host":"llvm"}').

The PR is backward compatible as demonstrated in @junrushao1994’s comment https://github.com/apache/tvm/pull/7462#issuecomment-781584977. And it does bring some changes, e.g., the name of the field in config dict changed from targer_host to host. Also, for any functions that have the argument of target_host, while the change is still backward compatible, new support for host field in target object would be required as @leandron pointed out in https://github.com/apache/tvm/pull/7462#discussion_r578322231. That means we would acutally have 2 ways to declare target host, from which we may choose one as preferred and deprecate the other one.

Would love to see more thoughts on how we should proceed on the preferred solution for the backward compatibility issue. Further updates on the documents and tutorials would also be addressed after we have consensus on the design.

3 Likes

Follow up on my previous PR on supporting the new target system with target_host folded into the target object. I have submitted a new PR on review [Target] Add support for target object with host field compatible with previous api by zxybazh · Pull Request #7534 · apache/tvm · GitHub this week to fully support all functions previously using target and target_host as argument with backward compatibility. This PR’s scope is to cover everything before we finally deprecate target_host. New tutorial code and test code are also updated in the PR for recommended practice. Please review the code the let me know any advice on the PR, thanks!

Currently, I did not remove all usage of target_host as a variable mainly because of two reasons:

  1. For heterogeneous targets, there could be multiple targets maintained in a dict, which makes a single target_host more convenient in related context. If we decide to move on in the deprecation, we might need to consider refactor the code for heterogenoues target.
  2. It takes more time to replace all the occurance, and part of the code would be refactored, e.g., the structure of SearchTaskNode and the signature of target_host related functions, once we decide to deprecate target_host. I want to leave the work later if we decide on deprecation and be more consistent in design at that time.

In the meantime, for next step, as @comaniac pointed out in his comment [Target] Add support for target object with host field compatible with previous api by zxybazh · Pull Request #7534 · apache/tvm · GitHub we have different options in target_host deprecation (also see @junrushao1994 [Target] Add target host field for target specification by zxybazh · Pull Request #7462 · apache/tvm · GitHub).

  • D0. Simply remove the target_host argument if that API is seldomly used.
  • D1. Keep the target_host argument, but log out a warning saying this is not effective anymore and will be deprecated. We can move on deprecation in the next release cycle.

I think both ways could work. Would like to see how does everyone think and which way could be better.