@tqchen @mjs thanks for your replies
there is also a new pip dependency resolver that is about to be standard. this should alleviate a number of the reproducibility problems, but i’m not sure if there is a replacement flow for generating e.g. a ci lockfile. assuming everyone upgrades to pip 20.3, the main concerns we need to address in the long run are how we manage the list of dependencies in the repo. in practice, I don’t think this upgrade will be standard for linux users til the major distributions pick it up.
wrt install_requires vs requirements.txt: this makes sense to me. I think that means that we could use the following rules to decide when to promote a version-constrained requirement from requirements.txt to install_requires (NOTE: all packages listed in requirements.txt should be included in install_requires, just maybe without a version):
- semver dependencies should be placed into
install_requiresasfoo >= X.Y, foo < (X+1). we should only specify semver dependencies to the minor version level (and even then, only if absolutely necessary) - “at least” dependencies (of the form
foo >= 2.1) should be placed intoinstall_requires. we should scrutinize these to see if they are really semver; we should also consider additionally placing afoo < (X+1)constraint. it’s possible that “at least” dependencies may not explicitly state they follow semver rules, so it wouldn’t be obvious that rule #1 is apropos, but nevertheless given a clear version pattern, restricting beyond the next major version may be prudent. - precise version pins in
requirements.txtshould be placed intoinstall_requires, but we should essentially never do this except in extreme cases.
now wrt the file layout:
- I like @tqchen proposal for
requirements.txtandrequirements-extra.txt. these can live in thepython/subdirectory of tvm repo. potentially,setup.pycould just apply some set of rules (i.e. the ones above) to generateinstall_requiresfrom these files without being overly specific. - for the CI: it’s not clear to me we need to pin beyond what’s done in
requirements.txt. a constraints file makes sense if we do need that. the main case I could think of is when a package introduces an accidentally-breaking revision. if we are cutting a release, and the CI needs constraints aboverequirements.txt, perhaps we should consider promoting that constraint toinstall_requires. finally, we do need a way for developers to get a list of which versions ended up being used in the CI (because right now, if you don’t have a GPU, you can’t produce this list). we don’t need to discuss that here, though.
finally, i’d like to think through some common dependency management cases:
C1. someone adds a new core dependency
- edit
requirements.txtand insert the new dependency in alphabetical order. - ensure no other
requirements-extra.txtspecifies this dependency - run a tool to validate the
requirements.txt(setup.py?) - update the CI containers and submit a
Jenkinsfilechange - submit requirements.txt PR along with new Python code that uses the new dependencies
C2. someone adds a new extras dependency
- same as core, but swap
requirements.txtandrequirements-extra.txt - test path isn’t as clear in the CI, but this is a separate problem
C3. a pinned or semver package’s version needs to get updated.
- edit
requirements.txtto update the version pin - test locally
- rebuild CI containers with new version.
- test with new CI containers (how: TBD)
- update the CI containers and submit a
Jenkinsfilechange - submit requirements.txt PR along with new Python code that uses the new dependencies
I think these all make sense to me, though we should consider how to improve the “update the CI containers” step in a separate RFC; specifically:
- ensuring all containers use the same version dependencies
- documenting the actual versions used