autoTVM dense crashes for larger matrix sizes

Hello,

I try to auto-tune dense-fp32 operators on ARM CPUs (Raspberry Pi3 + Pi4, Cortex A53 and A72). This works very well till the squared matrices are larger than a size of N_{max}=1024 for pi3 and N_{max}=2048 for pi4. The error message hints for a TimeoutError[2], therefore I increased the timeout in the measure options[1] up to exaggerated large values, like 100M. However, it don’t seems to have an effect. Is there another timeout constant I have to set? Or what I have to correct to auto-tune larger dense layers?

Thank you very much for your help!

[1] measure options:

'measure_option': autotvm.measure_option(
         builder=autotvm.LocalBuilder(
             build_func='default',
             n_parallel=None), # default=None
         runner=autotvm.RPCRunner(
             device_key,
             host=device_config.rpc_tracker_config.ip,
             port=device_config.rpc_tracker_config.port,
             number=rpc_number,                 # default 5
             timeout=100000000,                     # default 10
             ),

[2] TimeoutError, error message last lines:

    DEBUG:autotvm:No: 1525  GFLOPS: 0.00/0.00       result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.4859357)        [('tile_y', [-1, 16, 128]), ('tile_x', [-1, 128, 2]), ('tile_k', [-1, 16])],None,25885
DEBUG:autotvm:No: 1526  GFLOPS: 0.00/0.00       result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.4860249)        [('tile_y', [-1, 256, 4]), ('tile_x', [-1, 2, 2]), ('tile_k', [-1, 4])],None,13213
DEBUG:autotvm:No: 1527  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.63412)     [('tile_y', [-1, 8, 256]), ('tile_x', [-1, 2, 2]), ('tile_k', [-1, 32])],None,31505
DEBUG:autotvm:No: 1528  GFLOPS: 0.00/0.00       result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.4861183)        [('tile_y', [-1, 2, 1]), ('tile_x', [-1, 4, 2]), ('tile_k', [-1, 128])],None,43681
DEBUG:autotvm:No: 1529  GFLOPS: 0.00/0.00       result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.5059056)        [('tile_y', [-1, 4, 128]), ('tile_x', [-1, 4, 2]), ('tile_k', [-1, 4])],None,13325
DEBUG:autotvm:No: 1530  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.6650503)   [('tile_y', [-1, 8, 128]), ('tile_x', [-1, 2, 256]), ('tile_k', [-1, 16])],None,29784
DEBUG:autotvm:No: 1531  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.6837363)   [('tile_y', [-1, 2, 128]), ('tile_x', [-1, 1, 256]), ('tile_k', [-1, 4])],None,17536
DEBUG:autotvm:No: 1532  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7042453)   [('tile_y', [-1, 8, 32]), ('tile_x', [-1, 64, 16]), ('tile_k', [-1, 1024])],None,64637
DEBUG:autotvm:No: 1533  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7204044)   [('tile_y', [-1, 4, 256]), ('tile_x', [-1, 1, 4]), ('tile_k', [-1, 32])],None,32284
DEBUG:autotvm:No: 1534  GFLOPS: 0.00/0.00       result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.5216203)        [('tile_y', [-1, 256, 1]), ('tile_x', [-1, 128, 16]), ('tile_k', [-1, 512])],None,58586
DEBUG:autotvm:No: 1535  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7342663)   [('tile_y', [-1, 2, 2]), ('tile_x', [-1, 2, 64]), ('tile_k', [-1, 1024])],None,65377
DEBUG:autotvm:No: 1536  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7526162)   [('tile_y', [-1, 64, 8]), ('tile_x', [-1, 8, 256]), ('tile_k', [-1, 512])],None,60333
DEBUG:autotvm:Early stopped. Best iter: 0.
DEBUG:autotvm:XGB load 0 entries from history log file
Traceback (most recent call last):
  File "nn-autotuning.py", line 210, in <module>
    main( sys.argv[1:] )
  File "nn-autotuning.py", line 196, in main
    autotune_bunchOfTinyNets( device_config, botn, timestamp )
  File "nn-autotuning.py", line 91, in autotune_bunchOfTinyNets
    timestamp )
  File "nn-autotuning.py", line 65, in autotune
    autotuner.tune( device_config, ops, network )
  File "./autotuner.py", line 173, in tune
    tune_tasks(tasks, **tuning_opt)
  File "./autotuner.py", line 139, in tune_tasks
    autotvm.callback.log_to_file(tmp_log_file)])
  File "./xgboost_tuner.py", line 103, in tune
    super(XGBTuner, self).tune(*args, **kwargs)
  File "./tuner.py", line 111, in tune
    measure_batch = create_measure_batch(self.task, measure_option)
  File "./measure.py", line 257, in create_measure_batch
    attach_objects = runner.set_task(task)
  File "./measure_methods.py", line 252, in set_task
    "Cannot get remote devices from the tracker. "
RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by 'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.

will be fixed with future pull request 6924 and solves issue 6922.