A small subset of tests take up a large portion of the total test runtime. This RFC proposes that we skip these tests on PRs where CI runtime is critical and run them only on main
.
CI runtime constantly plagues TVM developers with long iteration times, exacerbated by flakiness and difficult-to-reproduce steps. To reduce runtime, we can execute more work concurrently and increase usage of available resources, usually by parallelization within (#9834) or between (#9733) CI jobs. Another way is to do less work, which is what this RFC proposes. By running some tests on main
only, we still get some measure of coverage provided by these tests without burdening PR developers.
The caveat is that tests that run on main
may now fail due to PRs that were green when they were merged, so this will require some buy-in from all TVM developers. However, the runtime savings are significant (see below) enough to make this worth it. Developments like #9554 will make the revert process much smoother as well to minimize disruptions.
Runtime Savings
A preliminary investigation of 1 PR with the --durations=100
flag added to pytest shows the following results (code). These totals are distributed across all steps but demonstrate what we could achieve.
[cutoff=10.0s] Total savings (m): 419.95m by skipping 695 tests
[cutoff=20.0s] Total savings (m): 338.27m by skipping 320 tests
[cutoff=30.0s] Total savings (m): 291.5m by skipping 205 tests
[cutoff=40.0s] Total savings (m): 251.59m by skipping 135 tests
[cutoff=50.0s] Total savings (m): 222.56m by skipping 96 tests
[cutoff=60.0s] Total savings (m): 203.3m by skipping 75 tests
[cutoff=70.0s] Total savings (m): 192.68m by skipping 65 tests
[cutoff=80.0s] Total savings (m): 181.56m by skipping 56 tests
[cutoff=90.0s] Total savings (m): 171.45m by skipping 49 tests
[cutoff=100.0s] Total savings (m): 160.36m by skipping 42 tests
This data was gathered by downloading the text log for each test step in the above CI job and running the code in the gist. The file names roughly correspond to the matching CI stage / step.
[cutoff=10.0s] @slow in frontend_gpu.ai saves 113.12m by moving 116 tests to main-only
[cutoff=10.0s] @slow in 386_python.ai saves 17.24m by moving 46 tests to main-only
[cutoff=10.0s] @slow in unittest_cpu.ai saves 29.65m by moving 67 tests to main-only
[cutoff=10.0s] @slow in python_arm.ai saves 12.84m by moving 44 tests to main-only
[cutoff=10.0s] @slow in topi_gpu.ai saves 67.19m by moving 65 tests to main-only
[cutoff=10.0s] @slow in unittest_gpu.ai saves 35.64m by moving 65 tests to main-only
[cutoff=10.0s] @slow in integration_cpu.ai saves 67.0m by moving 176 tests to main-only
[cutoff=10.0s] @slow in frontend_cpu.ai saves 44.0m by moving 58 tests to main-only
[cutoff=10.0s] @slow in 386_integration.ai saves 33.27m by moving 58 tests to main-only
[cutoff=10.0s] Total savings (m): 419.95m by skipping 695 tests
[cutoff=20.0s] @slow in frontend_gpu.ai saves 102.01m by moving 69 tests to main-only
[cutoff=20.0s] @slow in 386_python.ai saves 11.65m by moving 21 tests to main-only
[cutoff=20.0s] @slow in unittest_cpu.ai saves 20.93m by moving 27 tests to main-only
[cutoff=20.0s] @slow in python_arm.ai saves 5.1m by moving 9 tests to main-only
[cutoff=20.0s] @slow in topi_gpu.ai saves 61.95m by moving 44 tests to main-only
[cutoff=20.0s] @slow in unittest_gpu.ai saves 28.35m by moving 30 tests to main-only
[cutoff=20.0s] @slow in integration_cpu.ai saves 42.62m by moving 56 tests to main-only
[cutoff=20.0s] @slow in frontend_cpu.ai saves 39.08m by moving 37 tests to main-only
[cutoff=20.0s] @slow in 386_integration.ai saves 26.57m by moving 27 tests to main-only
[cutoff=20.0s] Total savings (m): 338.27m by skipping 320 tests
[cutoff=30.0s] @slow in frontend_gpu.ai saves 94.17m by moving 49 tests to main-only
[cutoff=30.0s] @slow in 386_python.ai saves 7.23m by moving 11 tests to main-only
[cutoff=30.0s] @slow in unittest_cpu.ai saves 16.68m by moving 17 tests to main-only
[cutoff=30.0s] @slow in python_arm.ai saves 3.01m by moving 4 tests to main-only
[cutoff=30.0s] @slow in topi_gpu.ai saves 55.53m by moving 29 tests to main-only
[cutoff=30.0s] @slow in unittest_gpu.ai saves 25.59m by moving 23 tests to main-only
[cutoff=30.0s] @slow in integration_cpu.ai saves 33.5m by moving 33 tests to main-only
[cutoff=30.0s] @slow in frontend_cpu.ai saves 32.04m by moving 19 tests to main-only
[cutoff=30.0s] @slow in 386_integration.ai saves 23.76m by moving 20 tests to main-only
[cutoff=30.0s] Total savings (m): 291.5m by skipping 205 tests
[cutoff=40.0s] @slow in frontend_gpu.ai saves 84.89m by moving 32 tests to main-only
[cutoff=40.0s] @slow in 386_python.ai saves 3.2m by moving 4 tests to main-only
[cutoff=40.0s] @slow in unittest_cpu.ai saves 13.21m by moving 11 tests to main-only
[cutoff=40.0s] @slow in python_arm.ai saves 3.01m by moving 4 tests to main-only
[cutoff=40.0s] @slow in topi_gpu.ai saves 51.79m by moving 22 tests to main-only
[cutoff=40.0s] @slow in unittest_gpu.ai saves 18.98m by moving 12 tests to main-only
[cutoff=40.0s] @slow in integration_cpu.ai saves 25.32m by moving 19 tests to main-only
[cutoff=40.0s] @slow in frontend_cpu.ai saves 29.14m by moving 14 tests to main-only
[cutoff=40.0s] @slow in 386_integration.ai saves 22.05m by moving 17 tests to main-only
[cutoff=40.0s] Total savings (m): 251.59m by skipping 135 tests
[cutoff=50.0s] @slow in frontend_gpu.ai saves 80.34m by moving 26 tests to main-only
[cutoff=50.0s] @slow in 386_python.ai saves 0.87m by moving 1 tests to main-only
[cutoff=50.0s] @slow in unittest_cpu.ai saves 11.1m by moving 8 tests to main-only
[cutoff=50.0s] @slow in python_arm.ai saves 0.0m by moving 0 tests to main-only
[cutoff=50.0s] @slow in topi_gpu.ai saves 48.9m by moving 18 tests to main-only
[cutoff=50.0s] @slow in unittest_gpu.ai saves 17.49m by moving 10 tests to main-only
[cutoff=50.0s] @slow in integration_cpu.ai saves 22.32m by moving 15 tests to main-only
[cutoff=50.0s] @slow in frontend_cpu.ai saves 24.69m by moving 8 tests to main-only
[cutoff=50.0s] @slow in 386_integration.ai saves 16.86m by moving 10 tests to main-only
[cutoff=50.0s] Total savings (m): 222.56m by skipping 96 tests
[cutoff=60.0s] @slow in frontend_gpu.ai saves 74.08m by moving 19 tests to main-only
[cutoff=60.0s] @slow in 386_python.ai saves 0.0m by moving 0 tests to main-only
[cutoff=60.0s] @slow in unittest_cpu.ai saves 9.33m by moving 6 tests to main-only
[cutoff=60.0s] @slow in python_arm.ai saves 0.0m by moving 0 tests to main-only
[cutoff=60.0s] @slow in topi_gpu.ai saves 47.9m by moving 17 tests to main-only
[cutoff=60.0s] @slow in unittest_gpu.ai saves 16.52m by moving 9 tests to main-only
[cutoff=60.0s] @slow in integration_cpu.ai saves 17.71m by moving 10 tests to main-only
[cutoff=60.0s] @slow in frontend_cpu.ai saves 24.69m by moving 8 tests to main-only
[cutoff=60.0s] @slow in 386_integration.ai saves 13.09m by moving 6 tests to main-only
[cutoff=60.0s] Total savings (m): 203.3m by skipping 75 tests
[cutoff=70.0s] @slow in frontend_gpu.ai saves 69.8m by moving 15 tests to main-only
[cutoff=70.0s] @slow in 386_python.ai saves 0.0m by moving 0 tests to main-only
[cutoff=70.0s] @slow in unittest_cpu.ai saves 8.25m by moving 5 tests to main-only
[cutoff=70.0s] @slow in python_arm.ai saves 0.0m by moving 0 tests to main-only
[cutoff=70.0s] @slow in topi_gpu.ai saves 46.86m by moving 16 tests to main-only
[cutoff=70.0s] @slow in unittest_gpu.ai saves 15.45m by moving 8 tests to main-only
[cutoff=70.0s] @slow in integration_cpu.ai saves 16.69m by moving 9 tests to main-only
[cutoff=70.0s] @slow in frontend_cpu.ai saves 23.66m by moving 7 tests to main-only
[cutoff=70.0s] @slow in 386_integration.ai saves 11.97m by moving 5 tests to main-only
[cutoff=70.0s] Total savings (m): 192.68m by skipping 65 tests
[cutoff=80.0s] @slow in frontend_gpu.ai saves 68.62m by moving 14 tests to main-only
[cutoff=80.0s] @slow in 386_python.ai saves 0.0m by moving 0 tests to main-only
[cutoff=80.0s] @slow in unittest_cpu.ai saves 5.75m by moving 3 tests to main-only
[cutoff=80.0s] @slow in python_arm.ai saves 0.0m by moving 0 tests to main-only
[cutoff=80.0s] @slow in topi_gpu.ai saves 41.78m by moving 12 tests to main-only
[cutoff=80.0s] @slow in unittest_gpu.ai saves 14.27m by moving 7 tests to main-only
[cutoff=80.0s] @slow in integration_cpu.ai saves 16.69m by moving 9 tests to main-only
[cutoff=80.0s] @slow in frontend_cpu.ai saves 22.47m by moving 6 tests to main-only
[cutoff=80.0s] @slow in 386_integration.ai saves 11.97m by moving 5 tests to main-only
[cutoff=80.0s] Total savings (m): 181.56m by skipping 56 tests
[cutoff=90.0s] @slow in frontend_gpu.ai saves 67.13m by moving 13 tests to main-only
[cutoff=90.0s] @slow in 386_python.ai saves 0.0m by moving 0 tests to main-only
[cutoff=90.0s] @slow in unittest_cpu.ai saves 5.75m by moving 3 tests to main-only
[cutoff=90.0s] @slow in python_arm.ai saves 0.0m by moving 0 tests to main-only
[cutoff=90.0s] @slow in topi_gpu.ai saves 41.78m by moving 12 tests to main-only
[cutoff=90.0s] @slow in unittest_gpu.ai saves 12.78m by moving 6 tests to main-only
[cutoff=90.0s] @slow in integration_cpu.ai saves 9.57m by moving 4 tests to main-only
[cutoff=90.0s] @slow in frontend_cpu.ai saves 22.47m by moving 6 tests to main-only
[cutoff=90.0s] @slow in 386_integration.ai saves 11.97m by moving 5 tests to main-only
[cutoff=90.0s] Total savings (m): 171.45m by skipping 49 tests
[cutoff=100.0s] @slow in frontend_gpu.ai saves 65.63m by moving 12 tests to main-only
[cutoff=100.0s] @slow in 386_python.ai saves 0.0m by moving 0 tests to main-only
[cutoff=100.0s] @slow in unittest_cpu.ai saves 2.59m by moving 1 tests to main-only
[cutoff=100.0s] @slow in python_arm.ai saves 0.0m by moving 0 tests to main-only
[cutoff=100.0s] @slow in topi_gpu.ai saves 38.51m by moving 10 tests to main-only
[cutoff=100.0s] @slow in unittest_gpu.ai saves 9.62m by moving 4 tests to main-only
[cutoff=100.0s] @slow in integration_cpu.ai saves 9.57m by moving 4 tests to main-only
[cutoff=100.0s] @slow in frontend_cpu.ai saves 22.47m by moving 6 tests to main-only
[cutoff=100.0s] @slow in 386_integration.ai saves 11.97m by moving 5 tests to main-only
[cutoff=100.0s] Total savings (m): 160.36m by skipping 42 tests
Implementation
#9915 implements the preliminary machinery for this change, after that tests can be decorated with @tvm.testing.slow
and CI will decide whether to run slow tests or not based on the following checks:
- If the branch matches
main
orci-docker-staging
- If text like
@ci run slow tests
appears in the last commit message or the PR body (this is necessary if a PR specifically edits or concerns a slow test). Note that changes to the PR body will not be reflected until the next time CI runs (implementing a listener / canceller would be much harder than querying at the beginning of each CI run)
Using a decorator has the advantage of being explicit compared to an automated system to detect and skip slow tests. Clicking through to the decorator’s short definition makes it clear what is going on so this shouldn’t add too much of a development burden. Slow tests run by default to minimize disruption (i.e. if a developer doesn’t see this RFC their workflow doesn’t need to change locally) but can be controlled by setting SKIP_SLOW_TESTS=1
.
Finding slow tests right now is manual (see the gist above), but the number of tests is tractable at least for a start. In the future we plan to fix and monitor test reports which should include test time to ease this process. However, that shouldn’t hold up this work since we can get concrete wins today by checking the logs.
Open Questions / Alternatives
- nightly-only tests are another possibility, but this makes it much harder to identify and revert offending commits since now a developer will have to go over every commit from the last day and check the tests.
- the threshold cutoff for slow tests is up for debate. I prefer to start aggressive and back off if reverts are a problem, so I would say 10s to start, increasing by 10s if reverts are burdensome. Cutoffs could be specialized to test files / steps but adds complexity and makes the system harder to reason about
- since PRs that are green may need to be reverted, someone will need to monitor the CI output and investigate failures. We do not have anyone committed to doing this right now but since this change treats tests as a black box and doesn’t require any domain knowledge of tests I think it’s reasonable for the CI maintainers to do this as a core responsibility (anyone else that notices failures is of course welcome to revert / fix). In general I think this will also be a good forcing function to monitor and fix other flaky / spurious failures on
main