Background
This RFC is a follow-on to [RFC] Consolidating Python Dependencies, implemented in PR 7289. It will be followed with a third RFC describing how we might leverage changes introduced this RFC to produce version-constrained tlcpack wheels and CI-tested requirements.txt for development.
After PR 7289 lands, TVM will contain a centralized list of Python dependencies: a list of packages with constraints where they are required for functional reasons (e.g. tests don’t pass without these constraints). However, this list has no bearing on the CI, so it has no teeth.
This RFC proposes changes to the way we install Python packages into TVM CI containers to address the following shortcomings:
S1. Scattered CI Python Dependencies
At the time of writing, there aren’t really any rules per se about how Python packages are installed into CI containers. Here are all the ways:
- Most dependencies are installed with
pip installcommands placed in scripts that live indocker/install - The set of scripts run for a particular image are listed in
docker/Dockerfile.ci_<image> - Some late-breaking dependencies are installed in
tests/scripts/task_ci_python_setup.sh
Constraints for these packages are just listed in the pip install command-line and don’t necessarily match the functional requirements listed in gen_requirements.py. For the most part, these are unconstrained, so the actual installed version of a package changes as the containers are rebuilt.
This is considered a good thing (it is part of our strategy for ensuring TVM testing remains up-to-date with respect to its dependences), but a drawback is that TVM developers have to manually query the installed version using pip freeze. Since there are many such containers updated independently, assembling a known-good list of TVM’s Python dependencies (as certified by CI) is not straightforward.
S2. Dependency Conflicts and pip 20.3
Not all requirements.txt describe a set of packages that can be installed. For example, if we have:
foo
bar>3
but foo depends on bar <3, the packages in this file should not be installable together.
However, until recently (Oct 2020), pip (the default Python package manager) considered this file line-by-line, not holistically. pip would, in this case:
- Read the dependencies for latest
foo - Realize
bar<3was required, and install the latest bar under3 - Read the next line, uninstall the
bar<3and install latestbarversion3or higher
pip only cared whether dependencies agreed when foo depended on several packages, each of which declared dependencies which may conflict.
Changes to pip install for the CI
Given these changes, we and all of our users are going to have to begin caring that we test against a compatible set of dependencies, so that it will be possible in the future to pip install TVM. This in particular means that the current approach to the CI (installing packages one-by-one in a set of decentralized scripts) is going to get more and more expensive as the dependency resolver encounters situations where each pip install progressively encounters trickier and trickier dependency groups.
The first step was centralizing the Python dependencies, accomplished in PR 7289. Next, we have a choice:
I1. Consolidate all pip install commands in docker/install scripts into a single pip install script which consumes only python/requirements/*.txt.
I2. Feed the python/requirements/*.txt into a tool such as poetry; use poetry lock to produce a poetry.lock file containing compatible python dependencies. Change all pip install commands to poetry install commands, which essentially forces installation of only those package versions constraints.txt file.
I3. Like I2 but translate poetry.lock into constraints.txt and continue to use pip in install scripts.
Independently or Jointly Building CI containers
At the time of writing, CI containers are updated one-by-one and by hand (a process which typically requires some manual bugfixing). This also means that, in implementing any I{1,2,3} above, the dependency resolver will find a compatible solution for the packages relevant to a particular CI container. That solution may be considerably different as compared with one inclusive of all TVM dependencies.
For instance, if ci-arm does not depend on onnx, but ci-gpu does, the selected version of torch may differ according to constraints torch places on the onnx package. We may get to a point where some TVM extras_install conflict with one another, but until we reach that point, it seems like we should strive to ensure all parts of TVM can be used at once without package version conflicts.
Therefore, we have a choice in how we implement I{1,2,3} above:
B1. Continue building all ci- containers independently. Implied by I1, unless we install all Python dependencies on one particular container and use pip freeze from that container as the constraints file for the others.
B2. When ci- containers are rebuilt, either periodically or because changes are needed, recompute dependencies assuming all TVM dependencies are installed at once (I.e. I2 or I3). Use a single constraints file for all ci- containers (but see Held Back Dependencies below).
The author’s opinion is that to truly address the problem imposed by the new pip dependency resolver, approach B2 is needed. However, it does reduce the number of different versions of each of TVM’s dependencies used in the CI, and in turn that does sacrifice some unquantifiable test coverage.
Recording CI-tested Package Versions
At the end the CI container build process, each ci- container should have a list of installed package versions—either the constraints.txt or from pip freeze. This file should be placed either in the TVM tree (I.e. docker/constraints/ci-cpu.txt) or in another repository accessible to the public. If in the TVM tree, it should be updated when the Jenkinsfile container version is updated.
These files are useful when users need to lookup the packages that make up a known-good TVM configuration. Without these files, the user must download each container and run pip freeze themselves—incredibly burdensome considering the size of the containers and that ci-gpu (used to build docs and run many of the tests) requires nvidia-docker to launch.
Note that in the event we choose approach B2 above, far fewer constraints.txt will need to be recorded. Only variants of the virtualenv introduced by Held-Back Dependencies below would necessitate more than 1 constraints list.
Held-Back Dependencies
There are a couple of cases where we intentionally explicitly pin or implicitly hold-back dependencies in the CI:
-
Development tools such as
pylintmay often be challenging to update as the reported errors can vary dramatically by pylint version. For these tools, motivation to update is also lower—having the best linting is just a nice-to-have relative to landing a TVM bugfix or feature request. Therefore, we may frequently hold back dev tools to make it easier to change or add other dev dependencies. -
Increasing Test Diversity. This one is trickier because we “implicitly” hold back dependencies by building each container independently and often a few months apart from the others. Since we just pick whichever package version is available at the time we build a container, this winds up running our e.g. unittests against a wider range of dependencies.
This RFC proposes we accomplish this more purposefully and on a more limited basis, by creating a file e.g.
ci-i386-constraints.txtin the TVM repo which describes holdbacks to large dependencies such as PyTorch or TensorFlow for e.g.ci-i386.
Topics for Discussion
T1. Do we agree it’s worth it to centralize Python dependencies from the CI? What concerns do we have about doing this?
T2. Which change (I1, I2, I3, or other) do you support to pip install for building ci-* containers?
T3. Do you support building all ci- containers at once (I.e. do you support B1 or B2)?