Currently, many of the unit tests loop over
order to run on all targets being tested. While the
@tvm.testing.parametrize_targets decorator does allow for test results for
each target, it silently hides disabled targets, which can hide failing tests.
I propose parametrizing the unit tests across both target and array size
parameters, to improve the accuracy of the test reports. An implementation of
this is shown in PR#8010, along
with a modified
test_topi_relu.py that uses the new features. The pytest
output before (left) and after (right) are shown below.
# Frequently used in tests def test_feature(): def verify_target(target, dev): # do test code here for target, dev in tvm.testing.enabled_targets(): verify_target(target,dev) #Currently possible, but rarely used @tvm.testing.parametrize_targets def test_feature(target, dev): # Do test code here #Proposed standard def test_feature(target, dev): # Do test code here
test_* function accepts parameters
dev, the test would
automatically be run on all enabled targets. Any targets that cannot be tested,
either because their compilation wasn’t enabled, or because no such physical
device exists, would be explicitly listed as skipped. This method also splits
up the single
test_feature test into separate tests
test_feature[cuda], and so on. Rather than indicating only a single
success/failure, splitting up the tests can indicate whether a bug lies within
the feature or within a specific runtime.
By making the default test behavior be to test on all targets specified in
TVM_TEST_TARGETS, it is harder for failing unit tests to be committed. For
example, currently several of the OpenCL unit tests fail. The current CI setup
runs tests with
USE_OPENCL=OFF, and there is no indication printed that these
tests are being skipped.
If tests should run on all targets except some known subset (e.g. everything
llvm), these exclusions can be specified with the new
tvm_excluded_targets variable. If this variable is specified in a test
module, either as a string or as a list or strings, then the targets listed will
be dropped entirely from those tests, and will not be displayed.
tvm_excluded_targets can also be specified in a
conftest.py file to apply to
an entire directory, or applied to a single function with
@tvm.testing.exclude_targets. Alternatively the current behavior of
specifying explicit targets with
@tvm.testing.parametrize_targets('target1','target2') can be used.
It is expected both that enabling unit tests across additional targets may
uncover several unit tests failures, and that some unit tests may fail during
the early implementation of supporting a new runtime or hardware. In these
tvm_known_failing_targets variable or
@tvm.testing.known_failing_targets should be used instead of excluding the
targets altogether. These failing targets show up on the pytest report as being
skipped, whereas excluded targets do not show up at all. It is intended that
these act as a to-do list, either of newly exposed bugs to resolve, or of
features that a newly-implemented runtime does not yet implement.
Parametrization can be done for array size/shape parameters as well, with similar benefits to test visibility as for the target. The proposed test style below would indicate which of the three array sizes resulted in an error, rather than just a single success/failure for all three.
# Current test style def verify_prelu(x, w, axis, weight_reshape): # Perform tests here assert(...) def test_prelu(): verify_prelu((1, 3, 2, 2), (3,), 1, (3, 1, 1)) verify_prelu((1, 3, 2, 2), (2,), 2, (2, 1)) verify_prelu((1, 3), (3,), 1, (3,)) # Proposed test style @pytest.mark.parametrize( "x, w, axis, weight_reshape", [ ((1, 3, 2, 2), (3,), 1, (3, 1, 1)), ((1, 3, 2, 2), (2,), 2, (2, 1)), ((1, 3), (3,), 1, (3,)), ], ) def test_prelu(x, w, axis, weight_reshape): # Perform tests here assert(...)
Once both the target and parameters are parametrized, this allows
target-specific tests to use
pytest.skip. For example, the following check
float16 support can be displayed as an explicitly skipped test.
# Current method def verify_relu(m, n, dtype="float32"): def check_target(target, dev): if dtype == "float16" and target == "cuda" and not have_fp16(tvm.gpu(0).compute_version): print("Skip because %s does not have fp16 support" % target) return # Run test for target, dev in tvm.testing.enabled_targets(): check_target(target, dev) @tvm.testing.uses_gpu def test_relu(): verify_relu(10, 128, "float32") verify_relu(128, 64, "float16") # Proposed standard @pytest.mark.parametrize( "m, n, dtype", [ (10, 128, "float32"), (128, 64, "float16"), ], ) def test_relu(target, dev, m, n, dtype): if dtype == "float16" and target == "cuda" and not have_fp16(dev.compute_version): pytest.skip("Skip because %s does not have fp16 support" % target) # Run test
Though not part of the current proposed changes, this also opens the door for other improvements in the future.
- Using tests as benchmarks for cross-target comparisons, since each function call pertains to a single target.
- Fuzzing the parametrized values, potentially exposing edge cases.
- Profiling time required to run each test. If a small percentage of tests are taking the vast majority of time for the CI to run, these tests may be pulled out into nightly tests, allowing the per-pull-request CI to run faster.