Motivation
Our current test suite takes a while to run. A main reason is that tests that only require a cpu are also being run on testing nodes that have gpus. With multiple PRs, tests running on gpus are often a limiting factor. Because demand is high, PRs have to wait until a gpu node is freed up before testing can begin.
Proposal
I propose we explicitly mark tests that require a gpu and run only marked tests on the gpu.
Pytest provides a mechanism to do this: markers.
Markers allow tests to be decorated with @gpu
(for example) and then pytest can select only tests with this marker using pytest -m gpu
.
Markers can be combined with pytest.mark.skipif
, to make sure that tests are only run when a required gpu is present.
I propose we use the following markers:
-
tvm.testing.uses_gpu
for tests that use both the gpu and cpu (see below). -
tvm.testing.requires_gpu
for tests that require the gpu. -
tvm.testing.requires_cuda
for tests that require the cuda. -
tvm.testing.requires_...
for tests that require rocm, opencl, etc.
Many tests use a variety of different devices, like llvm, cuda, and rocm.
There are three main ways that tests use devices: 1. tests iterate through tvm.relay.testing.config.ctx_list
2. tests iterate through tests/python/topi/python/common.py:get_all_backend
and 3. tests iterate through a hand picked list of targets and check if the device is enabled with tvm.context(device).exist
and tvm.runtime.enabled(device)
.
These methods do not allow us to separate out the gpu parts from the cpu parts.
To do this separation, I propose we merge 1. and 2. into a function called tvm.testing.enabled_devices
and replace 3. with a function tvm.testing.device_enabled
. These two functions would use an environment variable to determine which devices are enabled (a subset of the ones supported by the current build of TVM).
Cons
- Devices we test against are controlled by an environment variable. Environment variables can be hard to discover, so we should document this one well.
- Tests that use
tvm.testing.device_enabled
ortvm.testing.enabled_devices
must also mark their testing function withtvm.testing.uses_gpu
. If they don’t then the test will never be run with gpu devices. A fix would be having a special decorator that parameterizes the test over the devices and sets markers appopriately (using [pytest.mark.parameterize
](pytest parameterize)). Unfortunately, this would require rewriting a large amount of tests.