I am trying to write a conv_padded operator i.e. instead of padding the input image, we keep the image size same but involve ‘if’ conditions to compute the output.
The problem is to represent that ‘if’ condition. I don’t want to use ir_builder because that prevents us from the schedule optimizations like tiling, split.
I tried using tvm.select but it gets into infinite loop
def test_padded_conv():
data = tvm.placeholder((16, ), name='data')
kernel = tvm.placeholder((3, ), name='kernel')
k = tvm.reduce_axis((0, 3), name='kh')
# Unpadded conv looks something like this
# conv_padded = tvm.compute((16, ),
# lambda oh: tvm.sum(data[oh + k - 1] * kernel[k],
# axis=[k]),
# name="conv_padded")
conv_padded = tvm.compute((16, ),
lambda oh: tvm.sum(tvm.select( (oh+k-1>=0), (data[oh + k - 1]* kernel[k]), 0),
axis=[k]),
name="conv_padded")
s = tvm.create_schedule(conv_padded.op)
print(tvm.lower(s, [data, kernel], simple_mode=True))
Error is
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <object repr() failed> ignored
@masahi In case of explicit padding, we will be making the input matrix larger or creating an intermediate matrix with padded zeros. The usecase I am interested is in like VTA/FPGA, which does not have a way of explicitly inserting zeros. Currently, padding is done on host in the case of VTA. I am interested in writing a kernel that can do padded_conv on the FPGA itself.
yes, but if you do compute_inline() on the padding stage, you don’t allocate another bigger matrix. The padding logic will be embedded inside convolution inner loop. Here is an example of its use by cuda backend.