As we start to build multiple modules, it is useful to start modularizing the unit-tests with a goal of reducing some of the actual integration tests. Previously quite a few tests are written in a way that directly invokes end to end compilation, we also have tests that are coupled with legacy te pipeline. There are several issues, some of the tests running slow, and when regression happens it is harder to find out why since the tests are not unit tests in nature.
While sometimes some related tests are necessary and we would like to keep some legacy tests for a bit. It is important to move to a more unit-testing regime for new tests, and explicitly mark(group) tests that involves end to end execution (and slower). Having tests in different folder also helps us think more carefully about module boundaries. Of course we still want to be pragmatic and not too pedantic. For example, we still love the python first infra that helps us productively write tests, and some level of coupling is useful for us to productively write tests.
To keep things simple, I would like us to try get things moving starting with one module (TensorIR).
Here is how we can incrementally do that for TensorIR (and use this as an example).
Start with a new folder tests/python/tir
Put new TensorIR unit-tests into this folder
Migrate some test cases from existing ones into this folder, with the following goals
Always use TVMScript/IRBuilder before/after to unit test each pass
Avoid calling the build pipeline
For those that involves build e2e pipeline
Move to an explicit folder tests/python/integration/tir
OK to include some for target specific generation, in this case, start from (scheduled) TVMScript
Ensure such generation are fast (<1min)
Have an explicit naming pattern test_e2e_xxx
Move slow tests into a separate folder tests/python/slow/tir
Everything should work as it is. Of course we can still leave some legacy tests in the old place and once we done modularizing, we also would have a clear picture of things
Most of the existing tests are CPU-bound, including those who use GPUs for execution (end-to-end tests), which also rely heavily on CPU for code generation
All e2e tests can be decoupled as host-side compilation on CPU + execution on device (e.g. GPUs)
Brute-force splitting between fast and slow tests is less efficient because even slow tests could be CPU bound and not fully utilizing most of the GPU resources
Therefore, my proposal is: based on TVM RPC infra, instead of separating fast/slow tests, we should split host-side logic and device execution. Details:
Run all tests on CPU with single or limited number of threads
Provide an API via TVM RPC that allows execution of compiled code on an isolated GPU/Hexagon/ARM instance
The advantage of my proposal:
Concurrency: a CPU instance could run multiple CI pipelines in parallel;
Device utilization: the RPC infra makes sure only minimal logic is executed on device. It routes and manages execution efficiently and thus greatly improves device utilization and lowers the cost.
Thank you, I think these are orthogonal approaches. The first thing is mainly to isolate real unit test cases, into TVMScript and before/after focused, then the ones that runs integration can have different ways of improvements
BTW, it might be helpful so just wanted to share a running log using pytest --duration on my local CPU+GPU workstation: pytest_running_log.txt · GitHub
Some takeaways:
There are top 162 test cases that use more than 1sec, while the rest of 4.5k tests are rather fast;
Most of the slow tests are either from legacy modules (e.g. autotvm, auto_scheduler, TE schedule) or from end-to-end tests (e.g. running a runtime.Module)
There are 60 failed testcases that are not included on our CI, some of which are because the CI instances are not equipped with adequate hardware, e.g. tensor core GPUs >= SM_80, some can pass if tested alone, and I’m not sure about the rest
One thing that’s not completely relevant to this discussion, but I wanted to mention is that we could perhaps disable CI on draft PRs as by definition they’re not complete and there’s high probability that more commits would be pushed triggering the CI again.
This could also free up some resources for the other PRs (probably just a very small impact but might still make a difference occationally).