I am working with custom sparse ops, and though it all works for the LLVM backend, I am having some issues generating code for OpenCL.
The simplest op I’ve made is a sparse GEMM matrix multiplication, though this issue exists for other ops I’ve made.
The sparse part of the computation uses a lightly adapted version of TVM’s csrmm_default
(found in python/tvm/topi/sparse/csrmm.py). I’ve made a reproducibile example of OpenCL code being generated for sparse matrix multiplication in TVM available as this gist.
When I combine this sparse GEMM computation in a larger Topi Op for GEMM Convolution (featuring im2col, padding), it works for LLVM. However, when I try to generate OpenCL code for it, I get an error:
Did you forget to bind?
Variable `out` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `out` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `placeholder` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `placeholder` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `placeholder` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `placeholder` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `placeholder` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `placeholder` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `out` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable `placeholder` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "../src/tir/analysis/verify_memory.cc", line 202
RuntimeError: Memory verification failed with the following errors:
PrimFunc([placeholder, placeholder, placeholder, placeholder, out]) attrs={"global_symbol": "fused_nn_conv2d_sparse", "tir.noalias": (bool)1, "target": opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256 -model=unknown} {
// attr [data_im2col] storage_scope = "global"
allocate data_im2col[float32 * 10368]
for (k, 0, 128) {
for (m, 0, 81) {
data_im2col[((k*81) + m)] = placeholder[(((((floordiv(k, 16)*64) + (floordiv(m, 9)*8)) + (floordiv(floormod(k, 16), 4)*8)) + floormod(m, 9)) + floormod(k, 4))]
}
}
// attr [0] extern_scope = 0
parallel (row, 0, 8) {
// attr [dot] storage_scope = "local"
allocate dot[float32x81 * 1]
out[ramp((row*81), 1, 81)] = x81(0f)
dot[ramp(0, 1, 81)] = x81(0f)
for (idx, 0, (placeholder[(row + 1)] - placeholder[row])) {
dot[ramp(0, 1, 81)] = (dot[ramp(0, 1, 81)] + (x81(placeholder[(placeholder[row] + idx)])*data_im2col[ramp((placeholder[(placeholder[row] + idx)]*81), 1, 81)]))
}
out[ramp((row*81), 1, 81)] = (out[ramp((row*81), 1, 81)] + dot[ramp(0, 1, 81)])
}
}
Other folk have had similar issues, (e.g this issue here). @vinx13 suggested binding the axis with a schedule.
However, since this operation uses the tir.ir_builder
system, rather than the standard TVM Relay operation building, writing a schedule (e.g. by getting our loop axes with s[last].op.axis
) fails with:
AttributeError: <class 'tvm.te.tensor.ExternOp'> has no attribute axis
This above issue is described in an earlier question, to which @tqchen said “extern op cannot be scheduled because we get no control over the internal implementation of the op.”.
However, since this uses TVM, is there a way to schedule these ir_builder
operations?
I’ve looked through @ziheng’s RFC on the matter, and am still not sure. Is making a schedule that binds an axis even the right way to fix this OpenCL codegen issue?