[RFC] Consolidating TVM Python Dependencies

mjs · November 2, 2020, 9:49am

Hi. The semver proposal makes sense to me. There is arguably a case the that maximum major version constraints should be provided since the semver definition of a major version change specifically implies a breaking change of some form.

w.r.t the discussion around requirements.txt, common convention and best practice encourages that:

install_requires: provide abstract, minimal, requirements for a specific package
requirements.txt: provides exhaustive pinned versions for a compete environment

There is a general overview here: https://packaging.python.org/discussions/install-requires-vs-requirements/

The end user experience is defined by how we manage install_requires, rather than what we provide in a requirements.txt. While requirements.txt foo is a viable way to share fixed environments within the tvm projects own developer community, we should focus on the install_requires mechanism for tvm end users, because, that is the one they will use by default.

areusch · November 2, 2020, 5:44pm

@tqchen @mjs thanks for your replies

there is also a new pip dependency resolver that is about to be standard. this should alleviate a number of the reproducibility problems, but i’m not sure if there is a replacement flow for generating e.g. a ci lockfile. assuming everyone upgrades to pip 20.3, the main concerns we need to address in the long run are how we manage the list of dependencies in the repo. in practice, I don’t think this upgrade will be standard for linux users til the major distributions pick it up.

wrt install_requires vs requirements.txt: this makes sense to me. I think that means that we could use the following rules to decide when to promote a version-constrained requirement from requirements.txt to install_requires (NOTE: all packages listed in requirements.txt should be included in install_requires, just maybe without a version):

semver dependencies should be placed into install_requires as foo >= X.Y, foo < (X+1). we should only specify semver dependencies to the minor version level (and even then, only if absolutely necessary)
“at least” dependencies (of the form foo >= 2.1) should be placed into install_requires. we should scrutinize these to see if they are really semver; we should also consider additionally placing a foo < (X+1) constraint. it’s possible that “at least” dependencies may not explicitly state they follow semver rules, so it wouldn’t be obvious that rule #1 is apropos, but nevertheless given a clear version pattern, restricting beyond the next major version may be prudent.
precise version pins in requirements.txt should be placed into install_requires, but we should essentially never do this except in extreme cases.

now wrt the file layout:

I like @tqchen proposal for requirements.txt and requirements-extra.txt. these can live in the python/ subdirectory of tvm repo. potentially, setup.py could just apply some set of rules (i.e. the ones above) to generate install_requires from these files without being overly specific.
for the CI: it’s not clear to me we need to pin beyond what’s done in requirements.txt. a constraints file makes sense if we do need that. the main case I could think of is when a package introduces an accidentally-breaking revision. if we are cutting a release, and the CI needs constraints above requirements.txt, perhaps we should consider promoting that constraint to install_requires. finally, we do need a way for developers to get a list of which versions ended up being used in the CI (because right now, if you don’t have a GPU, you can’t produce this list). we don’t need to discuss that here, though.

finally, i’d like to think through some common dependency management cases:

C1. someone adds a new core dependency

edit requirements.txt and insert the new dependency in alphabetical order.
ensure no other requirements-extra.txt specifies this dependency
run a tool to validate the requirements.txt (setup.py?)
update the CI containers and submit a Jenkinsfile change
submit requirements.txt PR along with new Python code that uses the new dependencies

C2. someone adds a new extras dependency

same as core, but swap requirements.txt and requirements-extra.txt
test path isn’t as clear in the CI, but this is a separate problem

C3. a pinned or semver package’s version needs to get updated.

edit requirements.txt to update the version pin
test locally
rebuild CI containers with new version.
test with new CI containers (how: TBD)
update the CI containers and submit a Jenkinsfile change
submit requirements.txt PR along with new Python code that uses the new dependencies

I think these all make sense to me, though we should consider how to improve the “update the CI containers” step in a separate RFC; specifically:

ensuring all containers use the same version dependencies
documenting the actual versions used

tqchen · November 2, 2020, 8:19pm

Thanks @areusch I still think having a manually specified ci_constraint.txt is easier than having a resolver to pin the version. This is the way we pin dependencies right now, the developers have clear expectations about what to happen(regardless of the dep resolver’s behavior) and we do not need to involve a lock file that might complicates the docker build.

areusch · November 2, 2020, 8:48pm

@tqchen I’m more concerned that there are at least 112 dependencies to specify:

$ docker/bash.sh -i tlcpack/ci-gpu:v0.70 pip freeze | wc -l
118

probably more when you include the lint deps.

tqchen · November 2, 2020, 9:13pm

The lock file will contain all the dependencies that are resolved, e.g. numpy, dependency of the dependencies, which results in the 118 deps you talk about.

If we are going to specify the ci_constraint.txt, I would imagine we only specify the necessary ones, e.g. tensorflow==2.3.1 keras==2.4.3, without specifying the numpy version. While this is indeed less precise than the lock file, so far the pinning approach seems to work well for our case, and the ci_constraint.txt will provide more useful information to the users who need it.

areusch · November 2, 2020, 10:15pm

I think we need to export the output of pip freeze from the CI containers to help those who are trying to exactly reproduce the CI. I agree with you that we should try to constrain as few packages as possible–so ci_constraint.txt could be small. but for the purposes of exactly reproducing the CI, we need to provide an exact snapshot of the venv. a challenging part of doing this is that unless every constraint is pre-computed by e.g. running pip install with all possible packages, ci-lint and ci-gpu install different subsets of packages with ordering that is not easy to determine (you have to read 10 different shell scripts and determine which packages were affected in each), and thus may choose different indirect dependencies. this means it may not actually be possible to export such a pip freeze from the CI, because some indirect dependency may be at version A for lint and and B for gpu.

tqchen · November 2, 2020, 10:52pm

The main question is whether not the exact pinpointing is critical to reproduce CI. Our previous experience seems to suggest that pin pointing the few constraints are sufficient, if not, we should add things to the constraint.txt

Notably, there are also additional environments that are not covered by the pip lock, e.g. LLVM CUDA dependencies. When we are getting to the level of exact reproduce, perhaps the easiest way is still to fall back to use the exact docker binary tag itself.

areusch · November 3, 2020, 3:34am

it’s not really a question that has a general answer. sometimes pinpointing doesn’t matter and other times it does. as an extreme case, pylint cannot be reproduced without both precise pinpointing of both python packages and the interpreter itself. that’s why it can be hard to lint locally now (docker/lint.sh doesn’t work with git-subtree). most times, the interpreter version doesn’t matter so much.

we should also consider the case of using TVM with in conjunction with a project repo (I.e. not just using TVM in its source tree). in this case, suppose additional python packages need to be installed, some of which are large. you may prefer not to pay the cost of starting with ci-gpu, itself a 17.4GB image. instead, you may want to pick and choose the dependencies, and point pip at a constraints file that lists the exact versions of packages that were installed.

anyway, I think this discussion is more about the CI flow which should go in another RFC. I think we both do agree that ci_constraint.txt is a reasonable thing to do to add constraints used to build CI docker images above those specified in requirements.txt.

tqchen · November 3, 2020, 12:25pm

I agree, at a high level having a constraint.txt would be sufficient in most cases. In the case of pylint, while it is true there might be some python version dependent errors, most cases they are not inconsistent with each other(with the correct pylint version) and the error message produced by pylint are still helpful to do a quick lint fix.

areusch · November 3, 2020, 6:28pm

thanks for all the discussions! here are the next steps I can see:

create a python/requirements.txt for the core direct dependencies, specify version constraints as appropriate to be used in the CI. also create e.g. python/requirements-tflite.txt files (one per extra feature of TVM) in the same spirit.
modify python/setup.py to use those requirements.txt to derive install_requires.
create python/requirements-dev.txt, which will not be read by setup.py but which should be used to install dev dependencies (i.e. lint, pytest, sphinx).
modify the CI docker build scripts as follows:
1. update to pip 20.3 to use the new dependency resolver
2. delete all pip install foo commands and replace them with a single pip install command (see next bullet point)
3. create python/ci-constraints.txt which captures even more restrictive constraints for use with the CI that we would not ordinarily populate.
4. modify the CI install scripts in docker/install/*.sh, deleting all pip install commands and replacing with pip install -e ./python -c python/ci-constraints-concatenated.txt (and potentially ./python[tflite] when extras are needed). ci-constraints-concatenated.txt is synthetically built by a shell script function concatenating these files (we may need to de-dupe python packages, so this may become a python script):
  - python/requirements.txt
  - python/requirements-$extra.txt
  - python/ci-constraints.txt

in a follow-on RFC, we’ll consider the challenge of ensuring that this constraints file is uniform across all CI containers (I.e. also the question of whether this is necessary). we’ll also consider challenges in keeping the requirements up-to-date, which may be easier with a tool such as poetry or may not. I think given the new pip dependency resolver, motivation for the use of poetry or some additional tool should come after considering that.

tqchen · November 3, 2020, 8:22pm

Looks good to me. Given the collections of requirements does it makes sense to create a requirements folder?

areusch · November 3, 2020, 8:36pm

sure, python/requirements/requirements.txt?

tqchen · November 3, 2020, 8:48pm

how about python/requirements.txt to follow normal convention, and then python/extra-requirements/tflite.txt

Alternatively, we can move all the extra requirements into a single file, given that it is relatively tedious for users to install separately anyway. They can cherry pick if they want

areusch · January 14, 2021, 1:57am

I’m starting to work on this one. One small departure we need to make from the previously-agreed-upon plan (from a few replies back): as part of this plan, we’re defining one extras_require for each TVM frontend. Some frontends depend on the same external package, but we need to ensure that where we place semver constraints on such packages, we do it uniformly across TVM.

Therefore, I propose we generate the set of requirements-*.txt files from a small Python tool python/gen_requirements.py. this tool would contain a specification for requirements like:

requirements_by_extra = {
    "tvm": ["attrs", "decorator", ...],   # Core requirements
    "importer-tensorflow": ["tensorflow", "tensorflow-estimator"],
    "importer-tflite": ["tflite", "tensorflow", "tensorflow-estimator"],
}

constraints = {
  "tensorflow": "^2.1",  # Maps package name to semver or pip-style requirements
}

we could then either:

R1. Check in the requirements-*.txt, and create a lint step to verify that the checked-in, generated requirements-*.txt matches the Python spec in gen_requirements.py

R2. .gitignore the generated requirements-*.txt, and invoke gen_requirements.py from setup.py.

I sort of lean towards R2, but open to suggestions.

leandron · January 14, 2021, 2:42pm

Hi @areusch, I’m sorry I’m a bit late, but was reading through the whole thread and got curious, as we are talking about Python dependencies, on what we envisage as the experience for an end-user, installing a TVM package generated using the proposed mechanisms.

At the moment, if we generate a TVM package (purely based on apache/tvm repo) and install it on a new Virtualenv, we don’t really get something that we can use to import a model, as none of the frontend dependencies will be in place.

It is also not very usual for users being expected to consume “requirements.txt”-style files directly. What usually is there, is a clean pip install <something>, that will install all dependencies required.

Q1: Is that correct to assume the proposal is that the user will be expected to pip install tvm[importer-tensorflow] (I’m using “tvm” as package here, just as a placeholder), which will install tvm with all required dependencies for TensorFlow frontend enabled, but in case the same user now needs to use ONNX or PyTorch, it will be broken until the user manually install the dependency?

Q2: I know a lot of thinking and effort was put into the current mechanism being designed, but I’m curious on why don’t we just declare all the dependent frameworks and versions, and let the package to install everything, rather than we needing to keep an elaborate mechanism of dependencies processing and files being generated? It seems to me that it would make the experience much more aligned with everything else in the ecosystem.

areusch · January 14, 2021, 4:40pm

hey @leandron, thanks for chiming in! here are some answers.

It is also not very usual for users being expected to consume “ requirements.txt ”-style files directly.

definitely. requirements.txt is purely intended for use with the tvm source repo. there are a lot of different virtualenv management tools, but most (all?) of them can consume requirements.txt. theoretically, it should also be possible to eventually install tvm in “editable” mode, but currently doing so requires that you’ve built libtvm.so, so using requirements.txt makes it a more useful “official” dependencies list, for source tree purposes.

Q1: Is that correct to assume the proposal is that the user will be expected to pip install tvm[importer-tensorflow] ( I’m using “ tvm ” as package here, just as a placeholder ), which will install tvm with all required dependencies for TensorFlow frontend enabled, but in case the same user now needs to use ONNX or PyTorch, it will be broken until the user manually install the dependency?

technically there’s a packaging flow that could further modify setup() arguments such that the end user flow could do something different or enable certain importers by default. but absent that (and I don’t believe the flow does anything like that now), that seems correct.

Q2: I know a lot of thinking and effort was put into the current mechanism being designed, but I’m curious on why don’t we just declare all the dependent frameworks and versions, and let the package to install everything, rather than we needing to keep an elaborate mechanism of dependencies processing and files being generated? It seems to me that it would make the experience much more aligned with everything else in the ecosystem.

I agree pip install foo is typically expected to be a batteries-included sort of command, but I think there are more practical reasons why installing all importer dependencies as default may be hard right now. for example, PyTorch is nearly 800MB. I think @tqchen may have more examples, but at least from a source code perspective, I believe the paradigm we’re attempting to embrace is to minimize the set of required dependencies to make it possible to use TVM in a wide range of settings. it should be quite easy to fix up the TVM error message to explain to users "please run pip install tvm[importer-onnx]" if tvm finds a missing dependency.

I think the end-user flow is definitely worth debating. the initial goal of this proposal is merely to consolidate the set of python dependencies scattered around the source tree, so that’s what i care about. happy to make changes as the community’s thinking evolves around the end-user flow.

areusch · January 15, 2021, 3:25am

I have published a draft PR.

areusch · January 26, 2021, 6:26pm

in implementing the draft PR and collecting the various package constraints from the codebase, i’ve now come across several different cases where we may want to add version constraints that aren’t strictly necessary for functional reasons. It’s not clear that these constraints should live in the gen_requirements.py tool from the PR. in scattering various pip3 install commands around the codebase, we’ve implicitly added this context, but we haven’t always documented why we’re restricting the version or the context for the restriction.

here are some contexts:

Functional Constraints – in some cases e.g. synr, xgboost, we must constrain TVM to a particular min version to avoid bugs. as we check these in to main, these will always be min version constraints and never max version constraints (an example of a max version constraint would be if we were making a release and discovered e.g. a just-released pytorch is incompatible and we have no time to patch–in this case, we want a release constraint, see below).
Dev Constraints – some packages e.g. pylint, cpplint, black are purely used as developer tools. for best developer experience, their output should be the same across all installations. therefore, those packages and their dependencies should be pinned exactly to whatever version is used in ci-lint. however, it should be possible to easily rebuild ci-lint to use the latest version.
Tested Constraints – people using TVM from the source tree should be able to install a set of dependencies that are the same version as is tested in the CI. However, these aren’t ci-constraints.txt as mentioned in this proposal–we don’t want to constrain future CI container builds, we just want to document and use whatever was selected at the previous CI container build.
Release Constraints – when building a tlcpack pip package, we may want to decide which of these constraints to impose on the generated wheels. for example, we may want to produce a pip package with no version constraints and one with CI-tested versions.

i’d propose to move forward with the current PR adding the Functional Constraints only to gen_requirements.py. However, the current PR doesn’t accommodate the remaining use scenarios and I don’t think we’ve fully addressed them here. Before we can address the remaining contexts, we’ll need to consider how all of this work is reflected in the CI–so, I’d like to merge this PR as an improvement to setup.py, and then work towards a follow-on RFC/PR which proposes changes to the process by which we build and update CI containers.

comaniac · January 26, 2021, 6:46pm

My two cents:

For the Dev constraints, if we don’t find any mismatch outputs, it would be more flexible to have a range of versions instead of just a single one; otherwise it might be tedious for many people that work on many projects at the same time, and each project may have no intersect to the pylint version, for example.

Meanwhile, I agree with you that the Tested constraints used in CI should be an exact version. IIUC, currently those constraints are specified in the docker files and the installation scripts, but not as strict as you proposed, of course. Maybe we can just simply make them comprehensive there?

Other than that, it makes sense to have the current PR first covering Functional constraints only to keep it concise and clean.

areusch · January 26, 2021, 7:19pm

@comaniac I’d propose that we don’t include the dev constraints in setup.py (including in any future RFC/PR). This should let you pip install -r python/requirements/dev.txt without constraining any versions (note the PR does not look like this right now).

I have noticed with Python projects is that pylint/astroid in particular are extremely sensitive to the version and even the cpython version being used. I agree a range of dependencies can sometimes make sense, but for exactly reproducing the CI output locally, I think exact pinning is needed.

Then again, there is another debate to be had as to “what is the process for exactly reproducing the CI?” Arguably, docker/bash.sh tlcpack/ci-lint:ver is the way to reproduce the CI, and one shouldn’t have high expectations for other procedures. However, you do currently have to download several docker containers (and you actually must have a GPU and nvidia-docker present locally) even to just try installing dev dependencies with exact versions into a Python venv, and that seems unnecessarily burdensome.

This should be discussed more in a follow-on RFC, but as a preview, my thinking is that when a Docker container version is bumped in Jenkinsfile, a separate e.g. ci-lint-freeze.txt should be updated somewhere which records the output of pip freeze from the container. Then, tools can post-process that to convert the precise pins from pip freeze into looser constraints according to a package’s versioning policy (e.g. for semver packages, convert ==x.y.z into ^x.y.z or ~x.y.z). These tools can drive Dev Constraints, CI-tested Constraints, or Release Constraints.

Meanwhile, I agree with you that the Tested constraints used in CI should be an exact version. IIUC, currently those constraints are specified in the docker files and the installation scripts, but not as strict as you proposed, of course. Maybe we can just simply make them comprehensive there?

Some discussion is also needed as to how CI containers actually pip install packages and whether some type of poetry-style locking should be used. And, some discussion is needed around intentionally CI-testing a range of packages to improve our test coverage. Presumably, some separate process is needed to generate a container-wide constraints.txt, but I believe there’s already some contention as to how much complexity we should incorporate into our container build process. So, let’s discuss this all in the follow-on RFC.