[RFC] Consolidating TVM Python Dependencies

@tqchen @mjs thanks for your replies

there is also a new pip dependency resolver that is about to be standard. this should alleviate a number of the reproducibility problems, but i’m not sure if there is a replacement flow for generating e.g. a ci lockfile. assuming everyone upgrades to pip 20.3, the main concerns we need to address in the long run are how we manage the list of dependencies in the repo. in practice, I don’t think this upgrade will be standard for linux users til the major distributions pick it up.

wrt install_requires vs requirements.txt: this makes sense to me. I think that means that we could use the following rules to decide when to promote a version-constrained requirement from requirements.txt to install_requires (NOTE: all packages listed in requirements.txt should be included in install_requires, just maybe without a version):

  1. semver dependencies should be placed into install_requires as foo >= X.Y, foo < (X+1). we should only specify semver dependencies to the minor version level (and even then, only if absolutely necessary)
  2. “at least” dependencies (of the form foo >= 2.1) should be placed into install_requires. we should scrutinize these to see if they are really semver; we should also consider additionally placing a foo < (X+1) constraint. it’s possible that “at least” dependencies may not explicitly state they follow semver rules, so it wouldn’t be obvious that rule #1 is apropos, but nevertheless given a clear version pattern, restricting beyond the next major version may be prudent.
  3. precise version pins in requirements.txt should be placed into install_requires, but we should essentially never do this except in extreme cases.

now wrt the file layout:

  • I like @tqchen proposal for requirements.txt and requirements-extra.txt. these can live in the python/ subdirectory of tvm repo. potentially, setup.py could just apply some set of rules (i.e. the ones above) to generate install_requires from these files without being overly specific.
  • for the CI: it’s not clear to me we need to pin beyond what’s done in requirements.txt. a constraints file makes sense if we do need that. the main case I could think of is when a package introduces an accidentally-breaking revision. if we are cutting a release, and the CI needs constraints above requirements.txt, perhaps we should consider promoting that constraint to install_requires. finally, we do need a way for developers to get a list of which versions ended up being used in the CI (because right now, if you don’t have a GPU, you can’t produce this list). we don’t need to discuss that here, though.

finally, i’d like to think through some common dependency management cases:

C1. someone adds a new core dependency

  1. edit requirements.txt and insert the new dependency in alphabetical order.
  2. ensure no other requirements-extra.txt specifies this dependency
  3. run a tool to validate the requirements.txt (setup.py?)
  4. update the CI containers and submit a Jenkinsfile change
  5. submit requirements.txt PR along with new Python code that uses the new dependencies

C2. someone adds a new extras dependency

  • same as core, but swap requirements.txt and requirements-extra.txt
  • test path isn’t as clear in the CI, but this is a separate problem

C3. a pinned or semver package’s version needs to get updated.

  1. edit requirements.txt to update the version pin
  2. test locally
  3. rebuild CI containers with new version.
  4. test with new CI containers (how: TBD)
  5. update the CI containers and submit a Jenkinsfile change
  6. submit requirements.txt PR along with new Python code that uses the new dependencies

I think these all make sense to me, though we should consider how to improve the “update the CI containers” step in a separate RFC; specifically:

  • ensuring all containers use the same version dependencies
  • documenting the actual versions used