XGB tuner throwing errors when tuning resnet-50 on Adreno GPU

Hardware Accelerator : Adreno GPU

OS : QNX 7.1

TVM : Built from source

I’m trying to tune resnet-50 model for Adreno GPU using XGB tuner. I’m encountering following errors :

  1. When tuning with 128 trials, initially I got "divisible by zero error. Upon debugging , I got to know that some output produced by XGboost model were close to zero. I resolved this by adding some checks. Is adding checks correct way to resolve division by zero error or any other way is also possible?
  2. Tuning for large number of trials is throwing following error : "** xgboost.core.XGBoostError: [02:04:57] /workspace/src/learner.cc:1257: Check failed: learner_model_param.num_feature >= p_fmat->Info().num_col**. How to resolve this error?

@srkreddy1238 @sanirudh @kparzysz

The division by zero is coming because of following schedule:

Why is the cost for first kernel 0? Is it because some kernels can’t be scheduled on the HW or HW doesn’t support such schedules?

Hi @VarunGupta, I am struggling to cross compile TVM for QNX OS, I am facing C++ library compatibility issues. Can you please help me with steps to build TVM for QNX OS.