This is a followup/total rewrite of this post, since I can no longer edit it. I’ve totally rewritten this post to make it clearer, and provide working simple examples of my issue, and some of the investigations I’ve done since.
I’m developing sparse versions of conv2d ops for TVM.
I’ve encountered an issue with optimisation levels and the GPU version. Basically, when there is a network with a ReLU layer after a sparse convolutional layer, and the optimisation level is set to greater than 0, then invalid code is generated.
Did you forget to bind?
Variable `T_relu` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
The issue is, I can’t identify what optimisation pass is actually responsible for this. If I run at opt_level=0
, but manually enable every documented optimisation pass in (as described by build_config
in the docs), then the code works fine. Even the pass "OpFusion"
, which I would assume to be responsible for the issue is okay.
Even if I run at opt_level 3, and pass all optimisations to the disabled_pass
argument of build_config
, I get the same error.
This suggests there is either 1) an undocumented optimisation pass, or 2) the disabled_pass
, and enabled_pass
arguments to build_config
are being ignored.
I made a simple script (a single conv2d layer + ReLU), that demonstrates that the sparse code is correct when running on the CPU, and the GPU when running at opt_level 0. However it reproduces the error with GPU + opt_level=3.
python3 tvm_sparse_test.py --backend cpu --opt_level 3
python3 tvm_sparse_test.py --backend gpu --opt_level 0
python3 tvm_sparse_test.py --backend gpu --opt_level 3 # error
All that is needed to run this code is to build my simplified version of the TVM v0.8 code that includes an implementation of sparse direct convolution. This is available as the v0.8-sparse-opt-issue
branch of my fork.
The code for that can be found at python/tvm/topi/nn/conv2d_sparse.py
. The actual sparse convolution is in the function csr_direct_convolution
, and the code is the same for both CPU and GPU versions.
Does anyone have any suggestions as to why I’m experiencing this issue, and how I might understand if hypotheses 1) and 2) are correct.