It is worth pointing out that:
- Most of the existing tests are CPU-bound, including those who use GPUs for execution (end-to-end tests), which also rely heavily on CPU for code generation
- All e2e tests can be decoupled as host-side compilation on CPU + execution on device (e.g. GPUs)
- Brute-force splitting between fast and slow tests is less efficient because even slow tests could be CPU bound and not fully utilizing most of the GPU resources
Therefore, my proposal is: based on TVM RPC infra, instead of separating fast/slow tests, we should split host-side logic and device execution. Details:
- Run all tests on CPU with single or limited number of threads
- Provide an API via TVM RPC that allows execution of compiled code on an isolated GPU/Hexagon/ARM instance
The advantage of my proposal:
- Concurrency: a CPU instance could run multiple CI pipelines in parallel;
- Device utilization: the RPC infra makes sure only minimal logic is executed on device. It routes and manages execution efficiently and thus greatly improves device utilization and lowers the cost.