This plot shows GFLOPS for two tasks during iterations of RPC tuning in Ansor. The red seams mark points at which the experiment was killed and then later continued by supplying an non-null argument to load_log_file that points to the logged records of the previously killed tuning experiment. Here is the relevant code snippet:
if load_dir is None:
load_records_str = None
else:
load_dir = Path(load_dir)
load_records_str = str((load_dir / 'records.json').resolve())
import shutil
shutil.copyfile(load_records_str, save_records_str)
if save_records_str:
measure_callbacks=[auto_scheduler.RecordToFile(save_records_str)]
else:
measure_callbacks=[]
print("Begin tuning...")
tuner = auto_scheduler.TaskScheduler(tasks, task_weights,
load_log_file=load_records_str,
strategy=scheduling_strategy)
The results sometimes makes it look like Ansor is starting tuning from scratch when I continue from an existing log file. Any possible reasons for this? Is this indicative of an unquiesced system and what is the variability that is correlated with the boundaries of where an experiment is stopped and continued?