I’m currently testing a model that uses deformable convs. I see this op is supported by TVM, but I’m having terrible performance issues. I’ve been using runtime debugger to check why the model is so slow and discovered:
95% of model execution time is due to deformable convs
1 op specifically takes 72% of the time and another one 15%
Can I tune deformable convs with autotvm? What else can I do to improve model performance?
P.S. the same model but without deformable convs can run in 3% of the time of the model with deformable convs
Is your target CPU? I think the deformable conv schedule is implemented only for cuda. If you try it on CPU, you would get a default schedule, which is single threaded dumb for loop.
No unless you are willing to get your hands dirty We need a specialization of schedule_deformable_conv2d_nchw for x86 or arm. Adding multithreading and vectorization is not difficult.
hmm I think schedules for existing ops (especially x86 conv2d) are too complicated to learn from. Maybe
you can look at pooling. It’s relatively simpler and it does multithreading + vectorization.
And here is the cuda deformable conv2d schedule definition. You need to replace [‘cuda’, ‘gpu’] with “cpu”.