Background
This RFC is a follow-on to [RFC] Consolidating Python Dependencies, implemented in PR 7289. It will be followed with a third RFC describing how we might leverage changes introduced this RFC to produce version-constrained tlcpack
wheels and CI-tested requirements.txt for development.
After PR 7289 lands, TVM will contain a centralized list of Python dependencies: a list of packages with constraints where they are required for functional reasons (e.g. tests don’t pass without these constraints). However, this list has no bearing on the CI, so it has no teeth.
This RFC proposes changes to the way we install Python packages into TVM CI containers to address the following shortcomings:
S1. Scattered CI Python Dependencies
At the time of writing, there aren’t really any rules per se about how Python packages are installed into CI containers. Here are all the ways:
- Most dependencies are installed with
pip install
commands placed in scripts that live indocker/install
- The set of scripts run for a particular image are listed in
docker/Dockerfile.ci_<image>
- Some late-breaking dependencies are installed in
tests/scripts/task_ci_python_setup.sh
Constraints for these packages are just listed in the pip install
command-line and don’t necessarily match the functional requirements listed in gen_requirements.py
. For the most part, these are unconstrained, so the actual installed version of a package changes as the containers are rebuilt.
This is considered a good thing (it is part of our strategy for ensuring TVM testing remains up-to-date with respect to its dependences), but a drawback is that TVM developers have to manually query the installed version using pip freeze
. Since there are many such containers updated independently, assembling a known-good list of TVM’s Python dependencies (as certified by CI) is not straightforward.
S2. Dependency Conflicts and pip 20.3
Not all requirements.txt
describe a set of packages that can be installed. For example, if we have:
foo
bar>3
but foo
depends on bar <3
, the packages in this file should not be installable together.
However, until recently (Oct 2020), pip (the default Python package manager) considered this file line-by-line, not holistically. pip would, in this case:
- Read the dependencies for latest
foo
- Realize
bar<3
was required, and install the latest bar under3
- Read the next line, uninstall the
bar<3
and install latestbar
version3
or higher
pip only cared whether dependencies agreed when foo depended on several packages, each of which declared dependencies which may conflict.
Changes to pip install
for the CI
Given these changes, we and all of our users are going to have to begin caring that we test against a compatible set of dependencies, so that it will be possible in the future to pip install
TVM. This in particular means that the current approach to the CI (installing packages one-by-one in a set of decentralized scripts) is going to get more and more expensive as the dependency resolver encounters situations where each pip install
progressively encounters trickier and trickier dependency groups.
The first step was centralizing the Python dependencies, accomplished in PR 7289. Next, we have a choice:
I1. Consolidate all pip install
commands in docker/install
scripts into a single pip install
script which consumes only python/requirements/*.txt
.
I2. Feed the python/requirements/*.txt
into a tool such as poetry; use poetry lock
to produce a poetry.lock
file containing compatible python dependencies. Change all pip install
commands to poetry install
commands, which essentially forces installation of only those package versions constraints.txt
file.
I3. Like I2 but translate poetry.lock
into constraints.txt
and continue to use pip
in install scripts.
Independently or Jointly Building CI containers
At the time of writing, CI containers are updated one-by-one and by hand (a process which typically requires some manual bugfixing). This also means that, in implementing any I{1,2,3} above, the dependency resolver will find a compatible solution for the packages relevant to a particular CI container. That solution may be considerably different as compared with one inclusive of all TVM dependencies.
For instance, if ci-arm
does not depend on onnx
, but ci-gpu
does, the selected version of torch
may differ according to constraints torch
places on the onnx
package. We may get to a point where some TVM extras_install conflict with one another, but until we reach that point, it seems like we should strive to ensure all parts of TVM can be used at once without package version conflicts.
Therefore, we have a choice in how we implement I{1,2,3} above:
B1. Continue building all ci-
containers independently. Implied by I1, unless we install all Python dependencies on one particular container and use pip freeze
from that container as the constraints file for the others.
B2. When ci-
containers are rebuilt, either periodically or because changes are needed, recompute dependencies assuming all TVM dependencies are installed at once (I.e. I2 or I3). Use a single constraints file for all ci-
containers (but see Held Back Dependencies below).
The author’s opinion is that to truly address the problem imposed by the new pip dependency resolver, approach B2 is needed. However, it does reduce the number of different versions of each of TVM’s dependencies used in the CI, and in turn that does sacrifice some unquantifiable test coverage.
Recording CI-tested Package Versions
At the end the CI container build process, each ci-
container should have a list of installed package versions—either the constraints.txt
or from pip freeze
. This file should be placed either in the TVM tree (I.e. docker/constraints/ci-cpu.txt
) or in another repository accessible to the public. If in the TVM tree, it should be updated when the Jenkinsfile
container version is updated.
These files are useful when users need to lookup the packages that make up a known-good TVM configuration. Without these files, the user must download each container and run pip freeze
themselves—incredibly burdensome considering the size of the containers and that ci-gpu
(used to build docs and run many of the tests) requires nvidia-docker
to launch.
Note that in the event we choose approach B2 above, far fewer constraints.txt
will need to be recorded. Only variants of the virtualenv introduced by Held-Back Dependencies below would necessitate more than 1 constraints list.
Held-Back Dependencies
There are a couple of cases where we intentionally explicitly pin or implicitly hold-back dependencies in the CI:
-
Development tools such as
pylint
may often be challenging to update as the reported errors can vary dramatically by pylint version. For these tools, motivation to update is also lower—having the best linting is just a nice-to-have relative to landing a TVM bugfix or feature request. Therefore, we may frequently hold back dev tools to make it easier to change or add other dev dependencies. -
Increasing Test Diversity. This one is trickier because we “implicitly” hold back dependencies by building each container independently and often a few months apart from the others. Since we just pick whichever package version is available at the time we build a container, this winds up running our e.g. unittests against a wider range of dependencies.
This RFC proposes we accomplish this more purposefully and on a more limited basis, by creating a file e.g.
ci-i386-constraints.txt
in the TVM repo which describes holdbacks to large dependencies such as PyTorch or TensorFlow for e.g.ci-i386
.
Topics for Discussion
T1. Do we agree it’s worth it to centralize Python dependencies from the CI? What concerns do we have about doing this?
T2. Which change (I1, I2, I3, or other) do you support to pip install
for building ci-*
containers?
T3. Do you support building all ci- containers at once (I.e. do you support B1 or B2)?