Different version of tvm leads to significantly different runtime latency

When I use version 0.10, the latency(without auto-tuning) for my dbnet model is 1s. When I use code of the main branch, the latency is 50s.

This phenomenon happens on both backend rocm and cuda.

I am wondering how does this happen?