Although conv2d_transpose
is intrinsically slower than conv2d
, the NNVM difference between the versions is larger than expected. This is possibly because c2d_t
doesn’t have a custom schedule (that looks into the OutputPad for the actual conv and input pad)
Here’s some benchmarking code.
sample output:
usec/call
TOPI
conv2d: 0.002746
conv2d_transpose: 0.043117
15x slowdown
--------------------------
NNVM:
conv2d: 0.002280
conv2d_transpose: 0.538102
conv2d_transpose: 0.062680 (with custom schedule)
236x slowdown (27x with custom schedule)
--------------------------
PyTorch
conv2d: 0.005772
conv2d_transpose: 0.054538
conv2d_dx 0.022895
9x slowdown