[RFC] TVM Target Specification

If we have the sortable flag, it needs to be stored somewhere, so that when the target is converted to a string, this flag will be emitted. This way the target can be fully serialized into a string and then regenerated from that string.

I think it’s a valid approach. Maybe I’d suggest names like ordered or unordered instead of sortable, but that’s it.

2 Likes

Some recent thoughts on the target spec. The choice of id seems to be indeed confusing as it indicates an unique value, while the target itself cannot be identified by the id. Summarizing the choices suggested in this space. I suggest that we change the term id to kind. Would love to hear from everyone’s thoughts

1 Like

During the development, I found as well id is a really vague term and doesn’t deliver too much information what it is referring to. I agree with @tqchen that we should use kind instead.

I agree that id is a bit confusing, but kind sounds weird to me as well. My understanding is that id indicates the backend to be used (e.g., llvm, cuda); while keys indicates the codegen (e.g., arm_cpu, cpu, gpu). Based on this mechanism, we need a combination of id and keys to identify a target. IMHO, if we are going to change the name of id, it would be better to also consider keys so that their combination could be more reasonable.

Another thought, from a user’s perspective, is to have a literally id to replace the current id and keys. For example, id=x86 implies llvm keys=cpu; id=arm_cpu implies llvm -keys=arm_cpu,cpu. It seems also fine because we only have one case (please correct me if I missed others) that uses keys to dispatch codegen (i.e., x86 and ARM).

Please note that the above proposal is different from the concept of “alias”. An “alias” could be at a higher level to locate a more specific target. For example, rasp4b could be an alias of arm_cpu -mtriple=armv8a-linux-gnueabihf -mattr=+neon,fp-armv8,thumb-mode -mfloat-abi=hard.

Note that the id+ keys does not necessary uniquely identifies the target, as we can have other values in the target fields.

The tag uniquely identifies the target, and serves the purpose you talk about. e.g. tag=x86-cpu can imply the additional options

Thanks for the clarification, but the tag in kind=llvm, keys=cpu, tag=x86-cpu looks redundant to me. On the other hand, the keys in kind=cuda, keys=gpu, tag=v100 seems also redundant.

Here are some examples I’m referring to:

  • alias=aws/c5: id=x86-cpu -mcpu=skylake-avx512.
  • alias=arm/rasp4b: id=arm-cpu -mtriple=armv8a-linux-gnueabihf -mattr=+neon,fp-armv8,thumb-mode -mfloat-abi=hard.
  • alias=aws/p3: id=nvidia-gpu -tag=v100.

As can be seen, although id cannot identify a specific target like you pointed out, it can locate to a kind of targets. To me, the semantic should be like “all targets with the same ID share the same build pipeline”. It also means that for example, a model built with target arm-cpu (without other attributes) should be capable of being executed by all ARM CPUs.

Not sure if the above expectation makes sense to you. It would be great if you could elaborate more with the 3 examples otherwise.

In all of these cases, tag refers to the alias.

In the case of the kind semantics, the same logic applies, "all targets with the same kind share the same build pipeline logic(that can be parameterized by the target fields)”

So you are saying they should be just:

  • Example 1
    • kind=llvm
    • keys=cpu
    • mcpu=skylake-avx512
    • tag=aws/c5
  • Example2
    • kind=llvm
    • keys=arm_cpu,cpu
    • mtriple=armv8a-linux-gnueabihf -mattr=+neon,fp-armv8,thumb-mode -mfloat-abi=hard
    • tag=arm/rasp4b
  • Example 3
    • kind=cuda
    • keys=gpu
    • tag=aws/p3,nvidia/v100 (I’m assuming you can have multiple tags?)

This makes more sense to me in terms of the tag and alias. Per your statement about kind also implies the same build pipeline, I think this statement holds by treating x86 and ARM as the same pipeline, which I personally don’t fully commit. However, I’m fine with this statement if that’s the majority option.

arm and x86 shares the same pipeline. note that the pipeline themselves can still be parameterized by the attributes of the target(e.g. whether it is x86 or ARM), but it is harder to make a major structural change

I’m interested to understand how would the JSON would look like for case 3 (above). Also, could you expand a little on how you see this working, specifically who would oversee the graph partitioning process (and call the expected passes in the expected order) ?

For completeness, @comaniac posted a follow-up RFC to cover Composite Targets -> [RFC] Composite Target

2 Likes

I would love to follow up and add a preprocessing stage to the TVM target creation. It is useful for users/developers to clearly know the complete target info at the earliest stage.

For example, the LLVM NVPTX backend adds -mtriple=nvptx64-nvidia-cuda, and -mcpu=sm_xx if missing. We want to move those information at the beginning stage of target creation to make sure the information is clearly presented to developers.

1 Like

For the sake of record keeping, @zxybazh recently submitted this PR [Target] Add target host field for target specification by zxybazh · Pull Request #7462 · apache/tvm · GitHub that implements parts of what is agreed in this thread. The PR is not merged as I write this, but I think it will be very soon.

During this discussion, @areusch touched on a very interesting topic regarding the API design for Target: [Target] Add target host field for target specification by zxybazh · Pull Request #7462 · apache/tvm · GitHub

In TVMC, which is user facing, we recently updated the syntax to allow multiple target specification using something like --target="ethos-n77, llvm mattr=+neon" or --target="acl, llvm mattr=+neon" or even --target="ethos-n77, acl, llvm mattr=+neon", which represents the sorted order of targets in which we want to offload the network.

There is some more discussion here, in case you are interested: BYOC Partitioning Support on TVMC

I think this is a very relevant discussion, considering we can use all the discussion in this thread to define the specification for the target, and then evolve into a more natural API for the user to care about, that we’ll then convert to the internal API specification the TVM care about.

I would be very interested in hearing what others think about where is the line between internal target representation and potential improvements for the user facing API.

1 Like

A bit more follow up thoughts on backward compatibility about the PR https://github.com/apache/tvm/pull/7462 supporting target decalreation with target host like target("cuda", "llvm") or target('{"kind":"cuda", "host":"llvm"}').

The PR is backward compatible as demonstrated in @junrushao’s comment https://github.com/apache/tvm/pull/7462#issuecomment-781584977. And it does bring some changes, e.g., the name of the field in config dict changed from targer_host to host. Also, for any functions that have the argument of target_host, while the change is still backward compatible, new support for host field in target object would be required as @leandron pointed out in https://github.com/apache/tvm/pull/7462#discussion_r578322231. That means we would acutally have 2 ways to declare target host, from which we may choose one as preferred and deprecate the other one.

Would love to see more thoughts on how we should proceed on the preferred solution for the backward compatibility issue. Further updates on the documents and tutorials would also be addressed after we have consensus on the design.

3 Likes

Follow up on my previous PR on supporting the new target system with target_host folded into the target object. I have submitted a new PR on review [Target] Add support for target object with host field compatible with previous api by zxybazh · Pull Request #7534 · apache/tvm · GitHub this week to fully support all functions previously using target and target_host as argument with backward compatibility. This PR’s scope is to cover everything before we finally deprecate target_host. New tutorial code and test code are also updated in the PR for recommended practice. Please review the code the let me know any advice on the PR, thanks!

Currently, I did not remove all usage of target_host as a variable mainly because of two reasons:

  1. For heterogeneous targets, there could be multiple targets maintained in a dict, which makes a single target_host more convenient in related context. If we decide to move on in the deprecation, we might need to consider refactor the code for heterogenoues target.
  2. It takes more time to replace all the occurance, and part of the code would be refactored, e.g., the structure of SearchTaskNode and the signature of target_host related functions, once we decide to deprecate target_host. I want to leave the work later if we decide on deprecation and be more consistent in design at that time.

In the meantime, for next step, as @comaniac pointed out in his comment [Target] Add support for target object with host field compatible with previous api by zxybazh · Pull Request #7534 · apache/tvm · GitHub we have different options in target_host deprecation (also see @junrushao [Target] Add target host field for target specification by zxybazh · Pull Request #7462 · apache/tvm · GitHub).

  • D0. Simply remove the target_host argument if that API is seldomly used.
  • D1. Keep the target_host argument, but log out a warning saying this is not effective anymore and will be deprecated. We can move on deprecation in the next release cycle.

I think both ways could work. Would like to see how does everyone think and which way could be better.

Coming late to the party here. As it happens I’m working on trying to separate device planning from memory planning as part of the ‘unified lowering’ effort. I’ve noticed ‘device’ in this setting means ‘DLDeviceType’ or the default/invalid ‘0’ type. A few parts of the code use DLDevice, thus distinguishing eg (kDLCPU, 0) from (kDLCPU, 1), but other parts forget the id. I want to check my assumption that what we really want out of ‘device planning’ is ‘target planning’, using the actual Targets (and not their labels). And I also assume that we intend Target to be able to distinguish distinct execution units of the same microarchitecture. In other words: Target subsumes both DLDeviceType and device ids. Thx.

Thanks @mbs-octoml!

Target is a compile-time concept, while device is a runtime concept. The separation design is to make sure that a target can describe a device that doesn’t exist on the host, mainly for cross compilation.

And yes, in principle, every TargetKind corresponds to a DLDeviceType, for example

  • llvm (compile time) => kDLCPU (runtime)
  • cuda (compile time) => kDLCUDA (runtime)
  • nvptx (compile time) => kDLCUDA (runtime)

Note that device_id is not present anywhere in the Target system, because we don’t require the device to be present on the host in compile time.

In other words, in compile time, IIUC we should replace DLDeviceType with TargetKind

Can I try an example? Again, this is all motivated by working in the device planning code so my pov may be skewed.

Let’s say we’ve implemented the above and we have three target labels and their associated definitions:

  cpu: llvm ...
  gpu1: ...   device_id=0
  gpu2: ...   device_id=1

I’m assuming a two gpu system without memory sharing. Perhaps they even have exactly the same microarchitecture and compiler options, hence I’ve include explicit device_id in the label definition. (I’d allow labels to ‘inherit’ from other labels so user’s could build up more specialized targets from prototypes, ultimately ending in a hard-coded target kind – there’s a lot of text in the above so forgive me if that’s the plan anyway).

I’m assuming these target labels can be used directly in ‘on_device’ annotations.

  on_device(..., attrs.device="gpu2")

I assume the notion of ‘host’ is the same as the notion of ‘default device’. Ie it is as if the entire user model was wrapped in:

  on_device(<model>, attrs.device="cpu")

I assume the user, or an intermediate transformation, may include additional annotations:

  x = on_device(add(a, b), attrs.device="gpu1");
  y = on_device(add(c, d), attrs.device="gpu2");
  add(x, y)

The device – or target – planning is just flowing target constraints through the code. Two targets are compatible if they have exactly the same fields. device_copy is needed at any transition between incompatible targets.

Is that what we want?

On my platform I have two AMD GPU, and I want use one of them by opencl, How can I set the target? Maybe “opencl 1” ?