Okay, sorry for not having looking enough by myself. thanks to your example I fount it was quite easy to implement it by myself with existing features, it just needed a modification of the autotuning script.
So basically I am using to temporay files to save the work done:
-
[model].log.tmp, checkpoints after each task completed, can be resumed without loss
-
[model].log.task.tmp, work done for the processing task, cannot be resumed without loss (no need to save it in case of exit or failure)
import os
import tempfile
tmp_log = log + '.tmp'
def tune_tasks(...):
...
for i, tsk in enumerate(reversed(tasks)):
...
# in case of transfer learning use the completed tasks log
if use_transfer_learning and os.path.isfile(tmp_log):
tuner_obj.load_history(autotvm.record.load_from_file(tmp_log)
with tempfile.NamedTemporaryFile() as tmp_task_log_file:
# tune in a the blank temporary file
tuner_obj.tune(..., callbacks=[..., autotvm.callback.log_to_file(tmp_task_log.name)])
# task completed, append the task log to the checkpoints log
with open(tmp_log, 'a') as tmp_log_file:
tmp_log_file.write(tmp_task_log_file.read().decode('utf8'))
# after tuning each tasks, pick the best ones from the tmp_log and remove tmp_log
autotvm.record.pick_best(tmp_log, log)
os.remove(tmp_log)
That way I can use resume a tuning session as you described, from the checkpoint file ([model].log.tmp or tmp_log
), without any loss of work.
Thank you.