[RFC] Naming scheme for TVM versions and packages

Motivation

One barrier that prevents users to quickly experiment with TVM, is that lack of a readily available pip packages. In order to generate standard Python packages, it is vital to have a consistent naming pattern (https://semver.org/), following PEP-440 (naming scheme for Python packages/dependencies), plus being able to track releases and development packages with local modifications, when creating packages to distribute to anyone.

This RFC proposes the use of setuptools_scm , as a way to track versions, and generate consistent names for our Python packages and builds.

Background

In the current implementation, a static version string is being used, committed to the source files. At the moment, it points to “0.7.dev1”. Without source changes, every package you generate out of it, will have “0.7.dev1” string version.

The version number is manually controlled by a script called version.py , which physically modifies a set of files, with a new version string. To make that official, those files should be committed to the repository.

As a developer, if you are want to track your own changes when generating packages, you would need to modify all the impacted files (usually running version.py ) in order to have a different version string. From a technical point of view, it creates unnecessary local file changes, that can be avoided.

Proposal

We propose the use of the official pypa’s setuptools_scm (https://github.com/pypa/setuptools_scm/), that can help us to track better local versions and name packages accordingly. This is part of a broader intention of having packages readily available for users, which still has some pending actions to become a reality.

Building from the a git tree is by far, the most used and recommended (https://docs.tvm.ai/install/from_source.html) environment used to build TVM and hence, to generate TVM packages.

In this case, setuptools_scm would help, by finding what is the latest known version, plus controlling how many upstream changes and also taking into consideration any local change some source tree may have. It generates PEP-440 compliant package names, so that we are sure about what we distributing in any package we generate.

To cover the case of “no git available” (in this case, there is no scm tool to query for tags), there is a fallback option that we can use to inform the version, so there is no functionality loss from the current setup

What needs to change?

  1. A new dependency setuptools_scm would be added to TVM and TOPI setup.py files (this is officially provided on pypa repository, so looks like a reliable dependency to have)
  2. The current version string “0.7.dev1” doesn’t say much today, as it points potentially to any commit hash on master since 22nd Nov 2019 (date when the tag origin/v0.6.0 was created), so that would be removed. In its place, a canonical version naming would be used instead.
  • Example: In the current state, a better version naming would be, “0.7.0.dev912”, which means: “v0.7.0” development, with “912” commits on top of previous version .
  • All needed to generate it, after the proposed change is " python setup.py --version " on “tvm/python” directory.
  1. The current tags on the repository (e.g. v0.5, v0.6) will need a bit of rework, to comply with v{MAJOR}.{MINOR}.{PATCH} , which we can decide just to add new tags on top of the existing ones, in case we need to keep the existing tags.
  2. To deal with the existing files that rely on having a version string, we think the Makefile that currently exists in the root of TVM repository will provide a way to query what is the current version (using something like “(cd tvm/python && python setup.py --version)”), so that it can be used in the non-Python files, and inject them dynamically.

To cover the all the languages being used, we suggest:

  • Python: it will be all built in with the setuptools_scm mechanisms to query the current package version we have (a few suggested here);
  • C-headers: we can make it include a dynamically created header file (i.e. generated by the build system and not checked into the git tree) that contains the correct version string, and of course document it;
  • YAML (used for Conda): we can plug it in with an environment variable and use Jinja templating, as described here (https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#id15);
  • Javascript: couldn’t find the specific file (tvm_runtime.js), but I think it would be the case of generating a javascript file that would be included by the requested source, exposing the version variable with the right value.
3 Likes

I agree that adopting the version convention 0.7.0.dev912 makes sense.

It would be useful to see if we can simply update to the version.py script to do so, my take is that it is not too hard as we can invoke a few git command to do that, and still use the same script to update versions when necessary.

My understanding is that version.py role is, for a given version, generate local changes to physical files, to be merged into the source tree.

With the proposed approach, we would be relying on setuptools_scm to calculate the proper PEP-440 compliant version at any given time, reducing as much as possible (ideally to no more than one, in my opinion) the need for hard-coded version strings.

The one here only relates to the “fallback version”, to cover the case of user who download a snapshot of the sources from GitHub, for example, and have no source control at all.

I’m curious on what would be the purpose of version.py in a situation which we could just query the current version with python setup.py --version, but I’m happy to discuss, in case I’m missing some use-case.

I am thinking along the lines of python version.py --scm-version that performs the calculation of the tag(as in the logic of setuptools_scm) and update the relevant files.

This can work, but there is a problem if version.py continues to modify source controlled files. Changing any version controlled file committed or not in effect changes the version of the code being built, scm-version handles exactly this situation, it constructs unique versions for git tagged versions, commits beyond tagged versions, private commits and local changes to source files. This leads to the situation that re-executing the version/build process will generate successively different versions.

Placing the version into a build artefact outside of source version control avoids this instability in the version/build process.

I see, I can see us doing that as well. We would need to have a clear fallback mechanism though for the generated files, as in cases when we want to build runtime only module and may not have a correct setuptools_scm dependency

Officially, we only release source code package on stable release. While provide convenient binary release tools so we can generate these release. I can see us still using the version.py to update the stable versions. Perhaps we could add another options to version.py --scm-version that directly queries, but not modify the files inplace. In that case it can be used to optionally gnerate artifacts.

https://github.com/apache/incubator-tvm/pull/6757 implements a variant of the proposal

1 Like