[AutoTVM] Strange error from tuning tensorcore schedules

masahi · September 20, 2021, 8:02pm

Hi, when I tune tensorcore schedules, I often hit this error. The error happens when feature_len below is None. Since this is a non-recoverable error and I don’t see an obvious workaround, I’m completely blocked from tuning some of my models

  File "/home/masa/projects/dev/tvm/python/tvm/autotvm/tuner/xgboost_tuner.py", line 105, in tune                                                                            
    super(XGBTuner, self).tune(*args, **kwargs)                                                                                                                              
  File "/home/masa/projects/dev/tvm/python/tvm/autotvm/tuner/tuner.py", line 169, in tune                                                                                    
    self.update(inputs, results)                                                                                                                                             
  File "/home/masa/projects/dev/tvm/python/tvm/autotvm/tuner/model_based_tuner.py", line 291, in update                                                                      
    maximums = self.model_optimizer.find_maximums(                                                                                                                           
  File "/home/masa/projects/dev/tvm/python/tvm/autotvm/tuner/sa_model_optimizer.py", line 89, in find_maximums                                                               
    scores = model.predict(points)                                                                                                                                           
  File "/home/masa/projects/dev/tvm/python/tvm/autotvm/tuner/xgboost_cost_model.py", line 295, in predict                                                                    
    feas = self._get_feature(xs)                                                                                                                                             
  File "/home/masa/projects/dev/tvm/python/tvm/autotvm/tuner/xgboost_cost_model.py", line 338, in _get_feature                                                               
    ret = np.empty((len(indexes), feature_len), dtype=np.float32)                                                                                                            
TypeError: 'NoneType' object cannot be interpreted as an integer

Apparently, tuning tensorcore schedules often end up generating invalid schedules, and in the worst case, I hit this error.

Has anybody seen this? Is there a workaround?

@merrymercy @comaniac @jcf94 @jwfromm @Hzfengsy @Meteorix

comaniac · September 20, 2021, 8:21pm

I never hit this error, although I don’t have many experiences of tuning TensorCore with AutoTVM. It seems like what you mentioned – the schedule is somewhat invalid so the feature extraction is failed.

In addition to diving into the root cause in the feature extraction, the fastest workaround I can think of is like

        feature_len = 1 # Use 1 as the default.
        for idx in indexes:
            if fea_cache[idx] is not None:
                feature_len = fea_cache[idx].shape[-1]
                break

        ret = np.empty((len(indexes), feature_len), dtype=np.float32)
        for i, ii in enumerate(indexes):
            t = fea_cache[ii]
            ret[i, :] = t if t is not None else 0
        return ret

cc @merrymercy

jwfromm · September 20, 2021, 9:51pm

I have seen that error before and think @comaniac is right about the cause. I usually work around it by changing the cost model’s feature type to knob rather than itervar by setting XGBTuner(..., feature_type="knob"), that seems to avoid the issue and doesn’t impact performance as far as I can tell.

masahi · September 20, 2021, 10:27pm

Thanks @jwfromm I’ll try your suggestion. I wonder how different feature types affect tuning results.