auto-scheduler gives inconsistent measurement results

I am tuning model yolov8 on x86 CPU using auto-scheduler. I ran 46336 (800*58) trials. The latency result after running is 10,968 ms. However, when I rebuilt the module by loading the best log and running module.benchmark, the latency results were significantly slower. I don’t know what error I made while loading the best log?. An additional problem. According to the instructions, the auto-scheduler is recommended to be modeled in NHWC format, I converted the layout to this format before tuning, but the final result shows “conv2d NHWC layout is not optimized for x86 with autotvm”.

This is the estimated total latency after 46336 trials

This is the measurement result when loading best log

This is the code I use during the tuning process

Thanks a lot for the help!!