Auto-tuned model speed up issue

So how can I only auto-tune the “fused..” tasks? How to choose the most time-consuming ops as the profiling log shows?