The AutoTVM module is very useful to bring lower inference latency through the auto-tuning process. However, the auto-tuning usually takes a long time to finish, especially when the task space is really large. With some preliminary studies, we found there are two limitations in the current auto-tuning pipeline.
- To collect accurate GFLOPS numbers when evaluating a config on the given hardware target, the measurement usually repeats a fixed number of times, which can be really time-consuming. We would like to reduce the hardware measurement cost while obtaining accurate measurement.
- The ModelBasedTuner of the autoTVM module uses a cost model and an optimizer to help choose promising configs to do the next-batch searching. An epsilon-greedy strategy with a fixed epsilon=0.05 is used here to avoid pure exploitation. However, such a value may cause the search to be overly greedy and sometimes trapped in a local optimal. We would like to dynamically balance the trade-off between exploration and exploitation, escaping from local optimum.
if self.trial_pt >= len(self.trials) - int(0.05 * self.plan_size): # if the trial list is empty or # the tuner is doing the last 5% trials (e-greedy), choose randomly index = np.random.randint(len(self.space)) while index in self.visited: index = np.random.randint(len(self.space))
- Use an adaptive evaluator to obtain an accurate estimate of the GFLOPS number with significantly reduced cost. We early stops the repeating evaluation on one config on the target when the collected data is reliable enough.
- Use an uncertainty-aware tuner. Intuitively, when searching the config space, we want it to do more exploitation when the cost model predicts accurately and does more exploration when the cost model doesn’t do a good job. And the uncertainty of the cost model can be taken into consideration when setting the epsilon value (which controls the amount of exploration-vs-exploitation) dynamically.
- For the adaptive evaluator, we add an argument in the measure_option named
enable_adaptive_evaluator. When it is set to true, the evaluation will be conducted in partitioned micro-batches one-by-one. When the coefficient of variation among the micro-batches reach a threshold, it early stops the evaluation and returns the mean cost.
- Add a new tuner named
RFEITunerin the autotvm.tuner. The
RFEITunerinherited from the
ModelBasedTunerand uses a new cost model named
RFEICostModel. This cost model uses a RandomForestRegressor rather than XGBoost, and replaces the cost estimate part with expected positive improvement.
- Add an argument in
uncertainty_aware, and a
dynamic_epis initialized to replace the former fixed epsilon 0.05. This
dynamic_epwill be updated dynamically based on the uncertain from the cost model.
- Send the PR.
Our work has almost finished and we will send the PR later, more detailed implementation can be obtained in the paper AdaTune-NIPS2020. Any feedback on the design or implementation is welcomed.