I am tuning model yolov8 on x86 CPU using auto-scheduler. I ran 46336 (800*58) trials. The latency result after running is 10,968 ms. However, when I rebuilt the module by loading the best log and running module.benchmark, the latency results were significantly slower. I don’t know what error I made while loading the best log?. An additional problem. According to the instructions, the auto-scheduler is recommended to be modeled in NHWC format, I converted the layout to this format before tuning, but the final result shows “conv2d NHWC layout is not optimized for x86 with autotvm”.
This is the estimated total latency after 46336 trials
This is the measurement result when loading best log
This is the code I use during the tuning process
Thanks a lot for the help!!