[RFC]FamilySeer: A new search method for Auto-scheduler

Motivation:

Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.

Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:

  1. The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.

  2. The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.

The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.

Solution:

We implement FamilySeer. A new search method to improve the search process of Ansor. 1

The overview of FamilySeer is shown above. FamilySeer has two key features:

  1. FamilySeer analyzes the similarity between subgraphs. The similarity can be described as the accuracy among subgraphs. Then we group these subgraphs into a subgraph family. The subgraph inside the family shares the same cost model to have better accuracy.
  2. When the cost model is updated during the search, we can foresee the subgraphs inside the same family using the highly accurate cost model. We use the cost model to find the exact high-performance code transformation instead of evaluate each code transformation, meaning no more costly tuning has to be done.

We evaluate our method and Ansor from both the search efficiency and the search quality.The search efficiency can be described as how much time has been used before reaching an optimal latency, and the search quality is directly related to the latency. We choose serval representative deep learning models like (Resnet, Mobilenet, Bert, Roberta, GPT2 and Vision Transformer) and evaluate them on Both CPU and GPU. We achieve an average speedup of 2.49x and 3.04x on search efficiency, respectively. We also achieve an improvement up to 1.14x on search quality.

More details and experiment results can be found in this paper.

How to use:

The code was based on TVM 0.8(Commit: 64𝑐1𝑏79) so we are trying to move the code to the master. We integrated our method inside Ansor so we can simply change the search_policy parameter to activate our approach.

tuner = auto_scheduler.TaskScheduler(
tasks, task_weights, load_log_file=log_file)
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=900 \* (len(tasks) + 1), # change this to 20000 to achieve the best performance
runner=ctx_list[0].runner,
measure_callbacks=[auto_scheduler.RecordToFile(log_file)],

    )
    tuner.tune(tune_option,search_policy="sketch.xgb.family_op") # Changing the search_policy from “sketch.xgb” to “sketch.xgb.family_op”

Comments are appreciated.

8 Likes

Thanks for the proposal! We are very interested in improving search algorithms and cost model. I was very excited to read about FamilySeer a week ago.

In terms of the subgraph similarity, AFAIK @comaniac and @zxybazh have been working independently on this topic to improve overall search time; On cost model, @merrymercy and I have been working on a project called TenSet to train a more effective cost model, where we do have task embeddings etc :slight_smile:

Looking forward to close collaboration on your PRs

3 Likes

Hi, @shanjunZ . Thank you for the proposal! It is an absolutely important problem and I have a few questions while reading FamilySeer.

  • How many cost models does FamilySeer usually construct?
  • Ideally, we may want to transfer knowledge from the tuning experience across the tuning jobs like we do with TenSet or Lorien. However, since FamilySeer builds multiple cost models each time, it is not clear to me how we do this or how easy it would be.
  • I’m wondering if you tried a single cost model for your Forsee tuning and compared with your multi-cost model approach. If the problem of single cost model is with biased training data, what happens if we use more balanced data? I think we may be able to balance it by adjusting search like you did with Foresee tuning. Would multi-cost model approach still win in this case?
  • How much budget did you use for Section 6.3?

Thank you and I will look forward to learn more about your work :slight_smile:

2 Likes