Establishing a home for TVM dev tools

Hi all,

I’d like to start a home in the TVM repo for TVM developer tools–that is, scripts that folks working on TVM might like to run locally. At present, TVM’s developer tools are quite scattered:

  • A Makefile is in the root directory which we don’t maintain
  • docker/bash.sh is the cornerstone for reproducing workflows from CI (we do have a wrapper script on top of this, see below, but this is the entry point). This is not an implementation detail of the CI–the script makes accommodation for almost any calling user’s uid/gid and allows for network port forwarding for interactive Jupyter. Despite this, there’s no reason anyone would think to look in docker for a script like this unless they were accustomed to docker.
  • The individual Jenkins tasks are defined in tests/scripts/task_*. This kind of makes sense, but this directory has become bloated with both scripts that really serve to define test suites and random other scripts that are primarily helper scripts for Jenkinsfile or GitHub Actions.
  • Linting can be done with docker/lint.sh but it is not well-tested
  • There is documentation in various README.md (e.g. docs/README.md, jenkins/README.md, jvm/README.md) describing how to do some local core development processes (e.g. building and serving docs), but each of these suffers from needing some common core infrastructure (such as docker/bash.sh) which itself isn’t documented.

Thinking more about how we might go about this, developer scripts are pretty closely tied to scripts that launch things in CI, since they mainly are scaffolding that invokes those CI scripts in a local dev environment. I’d like to propose a fairly broad overhaul that might affect local developer workflows, and given that I’d like to get feedback from the community.

  1. We create the dev top-level directory which contains scripts meant to be invoked by TVM developers.
  2. We create ci top-level directory (I’ve started down this path with the Jenkinsfile generator which mostly only @driazati and I touch right now).
  3. We move the following scripts:
    • docker/bash.shdev/docker-run.sh
    • docker/clear-stale-images.shdev/docker-rmi-unused-tlcpack-images.sh
    • docker/lint.shdev/lint.sh
    • docker/dev_common.shdev/lib/common.sh
    • tests/scripts/task_*ci/tasks/task_*
    • tests/scripts/ci.pydev/run_local_ci.py
    • All Jenkinsfile and GH Action Python helpers: to ci/tvm_ci
      tests/scripts/cmd_utils.py
      tests/scripts/determine_docker_images.py
      tests/scripts/github_cc_reviewers.py
      tests/scripts/github_docs_comment.py
      tests/scripts/github_tag_teams.py
      tests/scripts/github_tvmbot.py
      tests/scripts/git_skip_ci_globs.py
      tests/scripts/git_skip_ci.py
      tests/scripts/git_utils.py
      tests/scripts/http_utils.py
      tests/scripts/open_docker_update_pr.py
      tests/scripts/ping_reviewers.py
      tests/scripts/pytest_ids.py
      tests/scripts/pytest_wrapper.py
      tests/scripts/should_rebuild_docker.py
      tests/scripts/should_run_slow_tests.py
      tests/scripts/task_build.py
      tests/scripts/update_branch.py
      
  4. We remove Makefile and instead add a mode to dev/run_local_ci.py to run outside docker.
  5. We establish the tvm_ci Python package locally (e.g. create ci/tvm_ci/__init__.py, it’s not published to PyPI) and introduce tests/tvm_ci in order to consolidate helpers and unittest them.
  6. We rewrite the documentation tree (everything in docs/, but e.g. especially the Contributor Guide) to reference the new scripts. We add documentation describing how to use tools in dev.

Would be great to hear everyone’s thoughts!

Andrew

3 Likes

cc @comaniac @Hzfengsy @tqchen @junrushao @Mousius @manupa-arm @csullivan @kparzysz

I think this is a good idea. It wouldn’t affect my workflow (I typically just copy/paste script invocations from the build logs, when something fails).

1 Like

The proposal looks clean and I’m supportive.

One miner point: dev seems too broad to just cover developer scripts. I could think of two alternatives:

  1. /tools. This is similar to PyTorch IIRC.
  2. /scripts/dev. At least people know they are scripts.
2 Likes

I would definitely appreciate having a clear place for dev tools, especially on having clear instructions for how to use them (the linting scripts constantly give me a hard time).

Looks like a clean and good idea to me. One nit would be that maybe we can put .py and .sh in different places.

We discussed this at the Community Meeting today. Here are notes:

  • @leandron: docker/build.sh → dev/docker-build.sh
    • Can consider combining with docker-run.sh (i.e. add a flag)
  • @leandron: Where do Dockerfiles live?
    • Maybe makes sense to keep them close to bash.sh and `build.sh in the same dir—simplifies those scripts.
    • @areusch: Would rather keep those in ci rather than dev.
    • @leandron: important thing is to keep the install dir needs to stay relative to Dockerfile
    • You could cast these as dev tools if someone was installing deps on their local machine outside the Dockerfiles.
    • It’s challenging to take a libtvm_runtime.so built inside a Docker container and run it elsewhere.
  • @Gromero: what’s the official recommended path to start developing for microTVM?
    • @areusch: Reference VM if hardware is in the loop. Otherwise, docker images (moving towards 1-image-per-arch.
  • @areusch: does this influence folks’ workflow on the call?

That’s a great proposal that users/developers can have a straightforward way to hands-on TVM. For me, it’s definitely helpful if we can have a tuning script for end2end workloads.

I agree with @comaniac that a better name, e.g. /tools, may help.

Thanks all for the replies! A couple of thoughts:

I think for this one there’s a part of the community that’s ultimately trying to make tvmc tune that entry point. I’m not sure if there are other proposals being used elsewhere, but it would be great to know that if so (perhaps a different thread is helpful). I’m supportive of tvmc tune since it would also be available when TVM is pip-installed.

that’s a fair point, but we could create sub-directories to delineate–e.g. /dev/scripts. What other things can you foresee here? Perhaps the template config.cmake is one. I think one could make the same argument for tools, although perhaps as a top-level directory the argument is that tools in that context is in fact developer-facing and therefore “developer tools” is implied. Thoughts?

The proposal does discuss placing CI-related .py in ci/tvm_ci/ sub-directory. I’m not sure what to do about developer-facing Python–right now, in order to ensure it works fairly broadly, I’ve been reviewing to make sure it generally only uses the stdlib and could run on Python 3.6. Obviously this isn’t going to scale and eventually I’ll miss something, so having tests for these scripts would be helpful.

I was actually thinking from another perspective. I avoid /dev because this is a common keyword. For example, Linux has /dev directory. Some Github repos may also have a dev branch.

Possibly build_tools instead of tools - clearly these are tools for building tvm rather than actually end user facing ?

Perhaps whatever is missing tvmc --tune for end to end tuning could be extended ?

Ramana

I’m not sure If they are tools for building. If so, build_tools looks good to me.

I agree that tuning scripts should not be included.

Glad to see this happening @areusch!

I’d be worried about dev/scripts becoming another dumping ground for random scripts rather than an organised space which I think is what this proposal aims to achieve. I’ve seen scipy has a dev.py as a single developer script so I don’t think it’s that much of an issue to have dev for our developer tooling.

It’s worrying that we’re introducing code into the codebase which has to be checked in this way, it doesn’t seem unreasonable to specify which Python versions are supported by the developer tooling based on what we have available to test in CI?

I’m supportive of tools as that’s common across other tools (i.e. pytorch/tensorflow/numpy/scipy). apache/tvm is the source repository so it’s expected the tools are relevant to that rather than what ends up in the eventual package as user facing.

Ideally the developer tooling would be similarly organised to the CI tooling, in that you can pip install it - is there a reason we can’t do that? (For something similar, see https://github.com/scipy/scipy/blob/main/dev.py) Something like:

  • tools/ci - Python package containing tvm-ci CLI
  • tools/dev - Python package containing tvm-dev CLI

To summarize, most of the remaining concerns seem to be over how to name the tooling directory so as to avoid folks accidentally broadening the scope of a directory. I’d offer that most often, this happens either a) by mistake, because committers don’t understand how the tree is organized, or b) because someone is introducing a new type of tool that isn’t so different that it’s obvious it should live outside one of the currently-checked-in scopes.

The approach I’d advocate for here is to do the following:

  1. establish a top-level directory to house like scripts (those like scripts to be further sorted into sub-directories). document the sorting logic in <tld>/README.md so committers and contributors alike understand how to keep things tidy and when to propose new sub-dirs.
  2. keep sub-directories focused to a single type of script–this avoids this issue @Mousius raises:
  3. to start with, avoid doing anything complex–most contributors so far use a linux-like system and complexity needs maintenance. the main focus of this proposal is organization.

This is why I like /dev rather than something explicitly tailored for dev-facing build scripts as a top-level directory–it lets us plant that README.md. @ramana-arm would be curious on your thoughts given this additional rationale.

Some other responses:

That’s true, it is overloaded. I guess the concern here would be either misunderstanding the purpose of the directory or increasing the consequences of a typo (e.g. forgetting the leading / in typing a command or forgetting to cd /). I kind of am not sure either of these things is that big of a deal:

  • the context for a repo root is much different than the context of a filesystem root, so they should map to two different things in folks’ heads
  • most tools and interactive shells don’t operate out of /
  • nothing’s really executable in /dev, so it’s unlikely you’d be trying to invoke a script out of that dir.

But I do acknowledge this point. Let me know if I’m missing something.

I agree with specifying the Python versioning. Thus far, for any Python scripts in tests/scripts, I’ve attempted to ensure they rely purely on python3.6-compatible things and not install third-party packages where we can help it. We do need a few obviously-beneficial ones like requests. It might make sense to grow this Python package dir into at least a setup.cfg one could use to install those; I also don’t want to make it super-easy to rely on non-standard packages here either, so I’m sort of inclined to push instead on adding tests for these that execute in an empty Python venv.

The only downside I see to tools is it misses the opportunity to create the README.md I discussed above. My personal opinion is that I wouldn’t go to a tools directory looking for CI-related stuff (though it might be a second location I’d check if I didn’t find something more obviously CI).

Discussed this a bit above. I didn’t quite see how dev.py in scipy is different from what we have now, other than that they have a conda environment.py. Could you elaborate?

I also looked at a few other projects and it seems like they gravitate towards setup.py. We’re sort of supposed to be frontend-independent but in practice we are Python-dependent for build, so perhaps we could just make that official and adopt setup.py to build the compiler while still support running via a cmake build when using only libtvm_runtime.so.

Can you expand on how using tools prevents us creating a README.md? I would assume that tools/README.md, tools/dev/README.md and tools/ci/README.md can all exist and document the individual sets of tooling.

Does this mean starting with ci and dev at root, if we need more types of tooling we then add more root folders? Or do we have tools subdivided and CI is a special case package at root level, then if we add new types of tooling we figure out whether it too is a special case or whether it should go into tooling?

What we have now, as demonstrated by your initial post, is a set of scripts:

docker/bash.sh → dev/docker-run.sh
docker/clear-stale-images.sh → dev/docker-rmi-unused-tlcpack-images.sh
docker/lint.sh → dev/lint.sh

Whilst we’re organising these things, my proposal is that we aim to wrap this in some uniform CLI akin to the dev.py where-in you can go tvm-dev docker-run etc. This means you have a similar interface as you do to the CI tooling now (ci.py), one uniform folder structure and we can use a standard pattern for all our supporting tooling. This is similar to how many of the rust tools are rust projects themselves: rust/src/tools at master · rust-lang/rust · GitHub.

Ah–sorry my previous reply was a bit abbreviated. You’re right that it doesn’t exactly prevent us from creating a README.md, but I do think that tools in my mind tends to imply UX, and I’m not proposing we make e.g. task_python_unittest.sh into a dev-facing script (we may have another tool for this, but it would need to wrap those in the naive approach). I guess we could consider even non-user-facing scripts as tools–just, then everything becomes e.g. docker/bash.sh -> tools/dev/docker-run.sh, and one might be tempted to differentiate that script from tools/tvm_ci/determine_docker_images.py by placing it in tools/docker-run.sh instead). So I’m ambivalent, but I do think that could lead to a crowded tools dir.

Generally the first one here.

I think this is a good idea, but being cognizant of eng bandwidth I didn’t want to propose too much here. If folks are volunteering to build that tool, I’m all for it.