AutoTVM clarification

roger-zhao · December 30, 2020, 2:15am

Hi, as far as I know, AutoTVM is amazing for automatically kernel generating, however, some limitation still exists(might be I’m wrong, pls correct me):

For the beginning, we need online running with hardware target for training a reliable cost model;
For each network model, we need tuning out a dedicated(only for this model) cost model and schedule policy(only for this model, of course we can build a “data-base”-like to store many of them then can use it when runtime, such as tophub, right?), then for another model, we need do tuning again(still online, even with transfer learning);

In my opinion, “1.” is that because we need a general method to create a cost model for all kind of hardware targets, so learning-base method is a good choice, however, if a hardware already have a “cycle-accurate” cost model designed by their own, then training cost model should be unnecessary for this kind of target, right? “2.” is that current implementation of cost model training is based on feature-engineering, however, according to AutoTVM paper, seems that we consider use treeGRU as an option also, which as I understand it should be take the AST/IR as input to training cost model, if we training this cost model with as much as possible inputs(models), then this cost model can be used for this target as DNN supposed to do, am I right?

fantasyRqg · January 4, 2021, 3:06am

How to optimize GEMM on CPU shows basic steps what AutoTVM does

AutoTVM search optimiazations depends on search policies(cacheline, simd, etc. ) and nn operator’s params (input shape ,outputshape , kernel shape, etc)

Operators with same params can share optimized result cross network models. But in practical , operators with same params is rare

roger-zhao · January 4, 2021, 4:34am

That’s so kind of you! So, this means even TVM has “transfer learning” for AutoTVM, almost a brand-new autotuning will be needed if tuning a new NN model in the same hardware target, right? and “transfer learning” can just speed up or resume a tuning task, right?

fantasyRqg · January 4, 2021, 4:42am

There is no ‘transfer learning’. just same Operator params result in same cacluation

same data block ,same vectorization , etc .

roger-zhao · January 4, 2021, 5:17am

Ok. Actually TVM has a API called “load_history(*.tmp)” named as “transfer learning”, such as transfer_learning TVM example

fantasyRqg · January 4, 2021, 6:09am

You are right there is ‘transfer learning’ . This is first time I known ‘load_history’ api .

roger-zhao · January 4, 2021, 7:01am

@comaniac @zhiics , am I right?

comaniac · January 4, 2021, 8:05am

When you use the load_history of XGBTuner, the cost model in the XGBTuner will be trained by the loaded tuning logs so that it could find better configs sooner; otherwise it starts with random configs and uses their measurement results to train a new cost model.

roger-zhao · January 4, 2021, 8:14am

Thanks, so theoretically I can say that if the saved tuning log has enough infos(or tuning trials), so I can train a cost model offline(I mean without a real hardware target), because from user side, they don’t want to tuning every model/OPs(time cost) before deploying into production environment for the same hardware target.

comaniac · January 4, 2021, 8:28am

In that case, it’s actually better to directly save the best config of every op you have tuned. Otherwise your users still need to use the trained cost model to tune the ops for a few trials on the hardware device. You could refer to the idea in this presentation: https://youtu.be/Aw9Z07n5jJQ

roger-zhao · January 4, 2021, 9:32am

Thanks so so much, it’s what exactly I want to see!