Operator fusing with AutoTVM on GPU

sebap · August 14, 2019, 11:05am

Is operator fusing supported when using AutoTVM with GPU?

From the paper it seems that operator fusion is happening before AutoTVM (operator fusion is described in section 3, while AutoTVM in section 5).
There is answer from @eqy on a question regarding auto-tuning, from which I understand that for CUDA and OpenCL it is very difficult task. Hence I assume such fusing is not supported.
In AutoTVM source (e.g. https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d.py#L119) decorators appear to be only for non-fused operations.

So how this actually is with this operator fusing?

vinx13 · August 14, 2019, 8:01pm

AutoTVM tunes non-fused operators such as conv2d or dense. After tuning, we fuse elemwise or broadcast ops (add, relu, etc) to it.

Ajja · August 27, 2019, 3:06pm

@vinx13 Could you provide some example that it tunes only non-fused operations?

Because I’m not quite sure. e.g. in x86 scheduler you can access relu operation.

github.com

dmlc/tvm/blob/master/topi/python/topi/x86/conv2d.py#L361


            data = data_pad.op.input_tensors[0]


        n_pad, h_pad, w_pad, c_pad = data_pad.op.axis
        pad_fused = s[data_pad].fuse(n_pad, h_pad)
        s[data_pad].parallel(pad_fused)
        C = conv
        n, h, w, c = C.op.axis
        ry, rx, rc = C.op.reduce_axis
        n_out, h_out, w_out, c_out = output_op.axis
        s[C].vectorize(c)
        if op != output_op: # fuse bias + bn + relu into conv
            s[C].compute_at(s[output_op], c_out)
        else:
            fused = s[C].fuse(n, h, w)
            s[C].parallel(fused)


    scheduled_ops.append(op)


traverse(output_op)
return s

The topic was also mentioned here: How to fuse conv2d and following elemwise op?

vinx13 · August 27, 2019, 5:13pm

This is the task being tuned https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L171-L179
which doesn’t include relu (of course you can create your own task with relu included so that you can tune fused operator)