[MetaSchedule] How to resume tuning?

mshr-h · July 10, 2023, 12:56pm

Hi there,

I tried to resume MetaSchedule tuning but it doesn’t work.

Here’s the code I used. Anyone can help me?

TVM commit id: 3a337714947a03be54c26b083e6a274c411c3815

import os
import tvm
from tvm import meta_schedule as ms
from tvm.relay import testing


def tune(mod, params, target, work_dir, database):
  tasks, task_weights = ms.relay_integration.extracted_tasks_to_tune_contexts(
      extracted_tasks=ms.relay_integration.extract_tasks(
          mod,
          target,
          params,
      ),
      work_dir=work_dir,
  )
  return ms.relay_integration.tune_tasks(
      tasks=tasks,
      task_weights=task_weights,
      work_dir=work_dir,
      max_trials_global=10,
      num_trials_per_iter=5,
      database=database,
  )


mod, params = testing.mlp.get_workload(1)
target = tvm.target.Target("llvm -num-cores 4")

work_dir = "./log-mlp/"

if not os.path.exists(work_dir):
  print(f"Create directory {work_dir}")
  os.mkdir(work_dir)

print(f"Create JSONDatabase in {work_dir}")
database = ms.database.JSONDatabase(work_dir=work_dir)

assert len(database.get_all_tuning_records()) == 0

print("Tune...")
tune(mod, params, target, work_dir, database)

assert len(database.get_all_tuning_records()) >= 10

print(f"Load JSONDatabase from {work_dir}")
database_loaded = ms.database.JSONDatabase(work_dir=work_dir)

assert len(database_loaded.get_all_tuning_records()) == len(database.get_all_tuning_records())

print("Tune...")
# This should start the tuning from the previous run but starts from scratch.
tuned_database = tune(mod, params, target, work_dir, database_loaded)

assert len(tuned_database.get_all_tuning_records()) >= 20

Hzfengsy · July 10, 2023, 1:32pm

MetaSchedule (also AutoTVM and Ansor) does not support resuming. However, the logs are still in the database. e.g.

If you’ve tuned 100 rounds first time and stoped and tuned again for 1000 rounds. There are 1100 rounds in the database

mshr-h · July 11, 2023, 11:37am

Thanks for your response.

happyme531 · February 14, 2024, 5:25am

Ansor tuning can be resumed. This should be a common use case for average end-users, but sadly MetaSchedule does not implement this.

######################################################################
# A more complicated example is to resume the search.
# In this case, we need to create the search policy and cost model by ourselves
# and resume the status of search policy and cost model with the log file.
# In the example below we resume the status and do more 5 trials.


def resume_search(task, log_file):
    print("Resume search:")
    cost_model = auto_scheduler.XGBModel()
    cost_model.update_from_file(log_file)
    search_policy = auto_scheduler.SketchPolicy(
        task, cost_model, init_search_callbacks=[auto_scheduler.PreloadMeasuredStates(log_file)]
    )
    measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300)
    tune_option = auto_scheduler.TuningOptions(
        num_measure_trials=5,
        runner=measure_ctx.runner,
        measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
    )
    task.tune(tune_option, search_policy=search_policy)

    # Kill the measurement process
    del measure_ctx