Is there any way to find the optimal performance when the IR structure is determined?

As we known, The structure of IR(different pipeline) has a great impact on the operator performance.

Is there any way to find the optimal performance when the IR structure is determined?

Or Can we find a model (how to tiling by Manual experience or ML)to get optimal performance.

Maybe this way could redcue search space

Not sure I fully understand your question, but what you’re saying seems like the concept of AutoTVM. Since TOPI schedule is a fixed TE schedule template, which implies a fixed IR structure, AutoTVM searches for the best parameters in the template to achieve the best performance. As a result, compared to the latest auto_scheduler, AutoTVM has a much smaller search space – although it could still be huge.

In my application scenario a fixed schedule template has different tiling way, and get many different IR after optimization. There is a large search space while using AutoTVM, maybe the scene is complex, XGBoost in AutoTVM can not imply a precise model for offline scenario.

Sorry I still didn’t quite get it. IIUC, are you working on a TOPI schedule template you made, and that template may have different tiling ways? Most TOPI templates fix the tiling structure and use define_split to represent the best tile sizes, which by default are dividable factors, meaning that the lowered IR should not have tail loops all the time. In this case, the lowered IR structure should be the same.

Based on the above description, you can imagine AutoTVM is already working on finding the optimal performance in a fixed IR structure at the most times. In other words, if we have a way to determine the optimal performance when the IR structure is determined, we can directly use it as the AutoTVM tuner in most cases. However, AFAIK, there’s no such approach yet.

emmmm

Sorry, I mean, Can we get sub-optimal performance from different tiling ways for a fixed schedule(fix like compute_at, compute_inline, multi core,without fix tiling params),and get optimal performance from different schedule ways by autotvm,just split schedule and tiling

Hi @Augustiu,

Do you mean an auto-tuning approach in two steps:

  1. fix transformations like compute_at compute_inline with variable tiling
  2. Fix the tiling from step 1 and iterate over different compute_at, compute_inline, etc…

In this way you would reduce the knobs from knobs_1*knobs_2 to knobs_1+knobs_2.

If this is what you mean, I don’t think it is doable in an automated way, especially because I “think” (but not sure) that some of those transformations would not be possible for some tiling combination.

However, I think that setting the early stopping to something less than trials (e.g., early_stopping=trials/10) would achieve the same goal of the two-step approach.

Hope this helps,

I understand it is hard to fix all the tiling combination, but maybe it is useful for some operators. And we can Increase complexity of model. Sometime precise of XGBoost maybe not so good after setting the early stopping, I think manual experience need to reduce the search space or improve precision. Is there any way?

@giuseros Is there any suggestion?

So, if you want to switch to a sort of manual mode, you can simply remove the knobs related to TE transformations (compute_at, inline, etc…) run the tuner, then remove tiling transformation knobs, add back the TE transformations and re-run the tuner

Also, be aware that the new auto-scheduler framework (aka Ansor) has already been released, so maybe you want to have a look at that before spending a large amount of time on improving the auto-tuner. Link here: https://tvm.apache.org/docs/tutorials/auto_scheduler/tune_matmul_x86.html#sphx-glr-tutorials-auto-scheduler-tune-matmul-x86-py

1 Like