Hello,
I try to auto-tune dense-fp32 operators on ARM CPUs (Raspberry Pi3 + Pi4, Cortex A53 and A72). This works very well till the squared matrices are larger than a size of N_{max}=1024 for pi3 and N_{max}=2048 for pi4. The error message hints for a TimeoutError[2], therefore I increased the timeout in the measure options[1] up to exaggerated large values, like 100M. However, it don’t seems to have an effect. Is there another timeout constant I have to set? Or what I have to correct to auto-tune larger dense layers?
Thank you very much for your help!
[1] measure options:
'measure_option': autotvm.measure_option(
builder=autotvm.LocalBuilder(
build_func='default',
n_parallel=None), # default=None
runner=autotvm.RPCRunner(
device_key,
host=device_config.rpc_tracker_config.ip,
port=device_config.rpc_tracker_config.port,
number=rpc_number, # default 5
timeout=100000000, # default 10
),
[2] TimeoutError, error message last lines:
DEBUG:autotvm:No: 1525 GFLOPS: 0.00/0.00 result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.4859357) [('tile_y', [-1, 16, 128]), ('tile_x', [-1, 128, 2]), ('tile_k', [-1, 16])],None,25885
DEBUG:autotvm:No: 1526 GFLOPS: 0.00/0.00 result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.4860249) [('tile_y', [-1, 256, 4]), ('tile_x', [-1, 2, 2]), ('tile_k', [-1, 4])],None,13213
DEBUG:autotvm:No: 1527 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.63412) [('tile_y', [-1, 8, 256]), ('tile_x', [-1, 2, 2]), ('tile_k', [-1, 32])],None,31505
DEBUG:autotvm:No: 1528 GFLOPS: 0.00/0.00 result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.4861183) [('tile_y', [-1, 2, 1]), ('tile_x', [-1, 4, 2]), ('tile_k', [-1, 128])],None,43681
DEBUG:autotvm:No: 1529 GFLOPS: 0.00/0.00 result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.5059056) [('tile_y', [-1, 4, 128]), ('tile_x', [-1, 4, 2]), ('tile_k', [-1, 4])],None,13325
DEBUG:autotvm:No: 1530 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.6650503) [('tile_y', [-1, 8, 128]), ('tile_x', [-1, 2, 256]), ('tile_k', [-1, 16])],None,29784
DEBUG:autotvm:No: 1531 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.6837363) [('tile_y', [-1, 2, 128]), ('tile_x', [-1, 1, 256]), ('tile_k', [-1, 4])],None,17536
DEBUG:autotvm:No: 1532 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7042453) [('tile_y', [-1, 8, 32]), ('tile_x', [-1, 64, 16]), ('tile_k', [-1, 1024])],None,64637
DEBUG:autotvm:No: 1533 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7204044) [('tile_y', [-1, 4, 256]), ('tile_x', [-1, 1, 4]), ('tile_k', [-1, 32])],None,32284
DEBUG:autotvm:No: 1534 GFLOPS: 0.00/0.00 result: MeasureResult(costs=('',), error_no=7, all_cost=100000000, timestamp=1604753434.5216203) [('tile_y', [-1, 256, 1]), ('tile_x', [-1, 128, 16]), ('tile_k', [-1, 512])],None,58586
DEBUG:autotvm:No: 1535 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7342663) [('tile_y', [-1, 2, 2]), ('tile_x', [-1, 2, 64]), ('tile_k', [-1, 1024])],None,65377
DEBUG:autotvm:No: 1536 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1604753313.7526162) [('tile_y', [-1, 64, 8]), ('tile_x', [-1, 8, 256]), ('tile_k', [-1, 512])],None,60333
DEBUG:autotvm:Early stopped. Best iter: 0.
DEBUG:autotvm:XGB load 0 entries from history log file
Traceback (most recent call last):
File "nn-autotuning.py", line 210, in <module>
main( sys.argv[1:] )
File "nn-autotuning.py", line 196, in main
autotune_bunchOfTinyNets( device_config, botn, timestamp )
File "nn-autotuning.py", line 91, in autotune_bunchOfTinyNets
timestamp )
File "nn-autotuning.py", line 65, in autotune
autotuner.tune( device_config, ops, network )
File "./autotuner.py", line 173, in tune
tune_tasks(tasks, **tuning_opt)
File "./autotuner.py", line 139, in tune_tasks
autotvm.callback.log_to_file(tmp_log_file)])
File "./xgboost_tuner.py", line 103, in tune
super(XGBTuner, self).tune(*args, **kwargs)
File "./tuner.py", line 111, in tune
measure_batch = create_measure_batch(self.task, measure_option)
File "./measure.py", line 257, in create_measure_batch
attach_objects = runner.set_task(task)
File "./measure_methods.py", line 252, in set_task
"Cannot get remote devices from the tracker. "
RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by 'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.