Modularize and Modernize TensorIR Tests

junrushao · July 2, 2023, 8:58pm

It is worth pointing out that:

Most of the existing tests are CPU-bound, including those who use GPUs for execution (end-to-end tests), which also rely heavily on CPU for code generation
All e2e tests can be decoupled as host-side compilation on CPU + execution on device (e.g. GPUs)
Brute-force splitting between fast and slow tests is less efficient because even slow tests could be CPU bound and not fully utilizing most of the GPU resources

Therefore, my proposal is: based on TVM RPC infra, instead of separating fast/slow tests, we should split host-side logic and device execution. Details:

Run all tests on CPU with single or limited number of threads
Provide an API via TVM RPC that allows execution of compiled code on an isolated GPU/Hexagon/ARM instance

The advantage of my proposal:

Concurrency: a CPU instance could run multiple CI pipelines in parallel;
Device utilization: the RPC infra makes sure only minimal logic is executed on device. It routes and manages execution efficiently and thus greatly improves device utilization and lowers the cost.