sebap
August 14, 2019, 11:05am
1
Is operator fusing supported when using AutoTVM with GPU?
From the paper it seems that operator fusion is happening before AutoTVM (operator fusion is described in section 3, while AutoTVM in section 5).
There is answer from @eqy on a question regarding auto-tuning, from which I understand that for CUDA and OpenCL it is very difficult task. Hence I assume such fusing is not supported.
In AutoTVM source (e.g. https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d.py#L119 ) decorators appear to be only for non-fused operations.
So how this actually is with this operator fusing?
vinx13
August 14, 2019, 8:01pm
2
AutoTVM tunes non-fused operators such as conv2d or dense. After tuning, we fuse elemwise or broadcast ops (add, relu, etc) to it.
Ajja
August 27, 2019, 3:06pm
3
@vinx13 Could you provide some example that it tunes only non-fused operations?
Because I’m not quite sure. e.g. in x86 scheduler you can access relu operation.
data = data_pad.op.input_tensors[0]
n_pad, h_pad, w_pad, c_pad = data_pad.op.axis
pad_fused = s[data_pad].fuse(n_pad, h_pad)
s[data_pad].parallel(pad_fused)
C = conv
n, h, w, c = C.op.axis
ry, rx, rc = C.op.reduce_axis
n_out, h_out, w_out, c_out = output_op.axis
s[C].vectorize(c)
if op != output_op: # fuse bias + bn + relu into conv
s[C].compute_at(s[output_op], c_out)
else:
fused = s[C].fuse(n, h, w)
s[C].parallel(fused)
scheduled_ops.append(op)
traverse(output_op)
return s
The topic was also mentioned here: How to fuse conv2d and following elemwise op?
vinx13
August 27, 2019, 5:13pm
4
This is the task being tuned https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L171-L179
which doesn’t include relu (of course you can create your own task with relu included so that you can tune fused operator)