Question about the "pack-sum" loss of XGBoost in Ansor

Ansor uses its XGBoost based cost model in an advanced manner. Each prediction is a sum of several XGBoost calls. To train the model, a “pack-sum” loss is used.

Training a cost model in this way seems interesting. Can anyone explain the mechanism in detail? The only thing I got is the source code (tvm/xgb_model.py at main · apache/tvm · GitHub)

In my mind, this part in Ansor is almost similar to AutoTVM, I’m not sure if this has been explained in these two papers.

Recently there’s also another work about the cost model of Ansor: TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers | OpenReview

cc @merrymercy

You can find some descriptions in section 5.2 of the ansor paper (https://arxiv.org/pdf/2006.06762.pdf).

I think the comments in the code (tvm/xgb_model.py at 81480287a891f07f3939b98cc57fad39e217f317 · apache/tvm · GitHub) also clearly describes the idea.

I have read the paper and the comments. What I don’t understand is how we train a model with a sum as a ground truth. There is a custom callback function registered to XGBoost, and I have not figured out how it works.