I’m trying to use tvm to implement my operator kernels. In one step, I find it’s not easy to describe the computation through the iteration over the output tensor, instead, it’s straightforward to construct the output tensor by iterating the input tensor (writing the result to some specific locations in the output tensor at each step). However, the
tvm.compute(shape, fcompute, name='compute', tag='', attrs=None)
method seems to assume the first shape argument represents the shape of the output tensor.
So, is there any way to describe the construction of a tensor through iteration of input tensor?
Hey, guys, any ideas?
Writing the kernel in the other way(scatter into the result) is not support atm unless you directly use the hybrid script https://docs.tvm.ai/langref/hybrid_script.html
Note that expressing kernels in the other way will usually result in less efficient schedules on GPUs etc. So trying to formalize in compute is encouraged. It would be great if you can provide a specific description of the operator kernel you are trying to implement
@tqchen Thanks for your reply. Here is the example of what I’m trying to do.
Say we have an input tensor I
with layout N*C
and a weight tensor W
with layout K*C*R*S
. The computation can be described like this:
> for n in 0...N
> for k in 0...K
> for r in 0...R
> for s in 0...S
> //compute output location (h, w)
> for c in 0...C
> sum += I[n, c]*W[k, r, s, c]
> output[h, w, k] = sum
The computation can be recognized as performing 2D convolution on spatially sparse input tensor. The computation can be described conveniently through the traversal over I
. If expressing it in the view of output
, we need to know which n
contribute to each output location (h, w)
. But this requires to check if each location in the R*S
window is valid, which brings extra overheads.
Any progress or new idea for this question?
I think there are many other similar computations have the same problem.
I have the same problem with the computation for col2im
.
Input shape: [N, C, WINDOW_H, WINDOW_W, BLOCK_H, BLOCK_W]
Output shape: [N,C,H,W]
It is hard to express the computation while iterating over the output shape. A much more convenient way would be to iterate over the input shape, indexing the output in the same way as you would index the input on a convolution:
# example arguments
STRIDE_H, STRIDE_W = 2,2
WINDOW_H, WINDOW_W = 3,3
# initialize it with zeros
output[...] = 0
for n in 0...N
for c in 0...C
for wh in 0...WINDOW_H
for ww in 0...WINDOW_W
for h in 0...BLOCK_H
for w in 0....BLOCK_W
output[n, c, h*STRIDE_H + wh, w*STRIDE_W + ww] += input[n, c, wh, ww, h, w]
Where BLOCK_H and BLOCK_W are the number of windows you get while convolving over the output with a window of [WINDOW_H, WINDOW_W] and a stride of [STRIDE_H, STRIDE_W]. The reason for this is that you need to accumulate only over the regions where windows overlap.
Is there a way to do this kind of computation with tvm? Not only iterating over the input shape but also the type of reduction there, it doesn’t seem to fit reduce
or scan
.
Thanks!