related to https://github.com/apache/incubator-tvm/issues/6894
The bitserial dense operation for ARM CPU’s is not auto-tune -able for larger matrix sizes. For small sizes N=64,125,256,512 it works, but for N=768,1024,2048 it crashes with a broadcast shape error. I reproduced this with TVM v0.7 branch and the current head 80ca598 of main branch.
error message:
list of tasks:
task = bitserial_dense.arm_cpu
tasks = [Task(func_name=bitserial_dense.arm_cpu, args=(('TENSOR', (768, 768), 'uint8'), ('TENSOR', (768, 768), 'uint8'), 1, 1, 'uint8', 'int16', 1), kwargs={}, workload=('bitserial_dense.arm_cpu', ('TENSOR', (768, 768), 'uint8'), ('TENSOR', (768, 768), 'uint8'), 1, 1, 'uint8', 'int16', 1))]
Tuning...
[Task 1/ 1] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/108) | 0.00 sTraceback (most recent call last):
File "autotuning.py", line 239, in <module>
main( sys.argv[1:] )
File "autotuning.py", line 174, in main
autotune_bunchOfTinyNets( device_config, botn, timestamp )
File "autotuning.py", line 91, in autotune_bunchOfTinyNets
timestamp )
File "autotuning.py", line 65, in autotune
autotuner.tune( device_config, ops, network )
File "./autotuner.py", line 173, in tune
tune_tasks(tasks, **tuning_opt)
File "./autotuner.py", line 139, in tune_tasks
autotvm.callback.log_to_file(tmp_log_file)])
File "tvm/python/tvm/autotvm/tuner/xgboost_tuner.py", line 103, in tune
super(XGBTuner, self).tune(*args, **kwargs)
File "tvm/python/tvm/autotvm/tuner/tuner.py", line 169, in tune
self.update(inputs, results)
File "tvm/python/tvm/autotvm/tuner/model_based_tuner.py", line 270, in update
self.cost_model.fit(self.xs, self.ys, self.plan_size)
File "tvm/python/tvm/autotvm/tuner/xgboost_cost_model.py", line 184, in fit
x_train = self._get_feature(xs)
File "tvm/python/tvm/autotvm/tuner/xgboost_cost_model.py", line 348, in _get_feature
ret[i, :] = t if t is not None else 0
ValueError: could not broadcast input array from shape (1854) into shape (1278)