Optimize conv2D_transpose

conv2D_transpose layer performance seems not good enough after auto tuning(n_trial=1000)

What should I do to optimize it?