Tips for troubleshooting tuning slowdowns?

I mean the kernels after tuning are slower.

More precisely, measurements above are for an end-to-end execution of models with many kernels. Some kernels go faster, some slower - but overall the net effect is the models run slower after autoscheduling than the same model which has not been tuned.

I can look at profile data to find kernels and see which go slower. However, not quite sure on steps typically done after that.