In my first attempt to get the AutoScheduler working, it seems to be stuck on the file the tutorial provides.
I downloaded tune_network_x86.py from Auto-scheduling a Neural Network for x86 CPU — tvm 0.8.dev0 documentation to run on my Intel machine (which supports AVX2).
I uncommented # run_tuning()
I executed the Python file, but it seems to be stuck at task 2 (reported Speed also remains at 0). See snippet of AutoScheduler’s output below.
I changed network = "resnet-50"
to network = "mobilenet"
and again executed the Python file, and again the AutoScheduler remains stuck at task 2.
Help is appreciated, to fix the issue, or otherwise get the AutoScheduler to work.
Seemingly related post: [AutoTuning] How to debug when all trials are failing on GPU I am running an up-to-date Ubuntu and installed TVM-nightly at June 24.
==================================================
No: 198 GFLOPS: 0.00 / 0.00 results: MeasureResult(error_type:RunTimeoutError, error_msg:, all_cost:23.01, Tstamp:1628088581.33)
==================================================
Placeholder: placeholder, placeholder, placeholder
parallel p.0 (0,49)
for ci.0 (0,2)
for eps (0,6)
for nu (0,6)
for ci ((ci.outer*64),64)
data_pad = ...
input_tile = ...
for ci.1 (0,64)
unroll eps (0,6)
unroll nu (0,6)
unroll r_a (0,6)
unroll r_b (0,6)
data_pack = ...
bgemm.local auto_unroll: 16
for nu_c.1 (0,6)
for p_c.1 (0,7)
for co_c.1 (0,16)
for ci.0 (0,128)
for eps_c.2 (0,6)
for co_c.2 (0,8)
for p_c.3 (0,7)
bgemm.local = ...
for eps.1 (0,6)
for nu.1 (0,6)
for p.1 (0,49)
for co.1 (0,128)
bgemm = ...
inverse auto_unroll: 64
parallel p.0@co.0@p.1@ (0,196)
for co.1 (0,32)
unroll vh (0,4)
unroll vw (0,4)
unroll r_a (0,6)
unroll r_b (0,6)
inverse = ...
parallel ax0@ax1@ (0,28)
for w (0,28)
for co (0,128)
conv2d_winograd = ...
for ax2 (0,28)
for ax3 (0,128)
T_relu = ...
[16:49:41] /workspace/tvm/src/auto_scheduler/measure.cc:299: Warning: Too many errors happened during tuning. Switching to debug mode.
Time elapsed for measurement: 127.06 s
----------------------------------------------------------------------
------------------------------ [ Train cost model ]
----------------------------------------------------------------------
Time elapsed for training: 0.57 s
----------------------------------------------------------------------
------------------------------ [ Task Scheduler ]
----------------------------------------------------------------------
| ID | Latency (ms) | Speed (GFLOPS) | Trials |
-------------------------------------------------
| 0 | 0.017 | 0.23 | 6 |
| 1 | 0.512 | 8.00 | 6 |
| 2 | 0.020 | -0.00 | 6 |
| 3 | - | - | 6 |
| 4 | - | - | 6 |
| 5 | - | - | 6 |
| 6 | - | - | 6 |
| 7 | - | - | 6 |
| 8 | - | - | 6 |
| 9 | - | - | 12 |
| 10 | - | - | 12 |
| 11 | - | - | 12 |
| 12 | - | - | 6 |
| 13 | - | - | 6 |
| 14 | - | - | 12 |
| 15 | - | - | 6 |
| 16 | - | - | 6 |
| 17 | - | - | 6 |
| 18 | - | - | 6 |
| 19 | - | - | 6 |
| 20 | - | - | 6 |
| 21 | - | - | 6 |
| 22 | - | - | 6 |
| 23 | - | - | 6 |
| 24 | - | - | 6 |
| 25 | - | - | 6 |
| 26 | - | - | 6 |
| 27 | - | - | 6 |
| 28 | - | - | 6 |
-------------------------------------------------
Estimated total latency: - ms Trials: 198 Used time : 5081 s Next ID: 4
----------------------------------------------------------------------
------------------------------ [ Search ]
----------------------------------------------------------------------
Sample Initial Population #s: 1618 fail_ct: 0 Time elapsed: 2.33
GA Iter: 0 Max score: 1.2978 Min score: 1.2408 #Pop: 12 #M+: 0 #M-: 0
GA Iter: 4 Max score: 1.4138 Min score: 1.3630 #Pop: 12 #M+: 1392 #M-: 38
EvolutionarySearch #s: 12 Time elapsed: 11.76
----------------------------------------------------------------------
------------------------------ [ Measure ]
----------------------------------------------------------------------
Get 6 programs to measure:
...T...T.T.T*T*T
==================================================
No: 199 GFLOPS: 0.00 / 0.00 results: MeasureResult(error_type:BuildTimeoutError, error_msg:, all_cost:15.00, Tstamp:1628088655.35)
==================================================
Placeholder: placeholder, placeholder, placeholder
Conv2dOutput auto_unroll: 64
for yy.1 (0,7)
for xx.1 (0,7)
for ff.1 (0,8)
for i1 (yy.outer.outer.inner,3)
for i2 (xx.outer.outer.inner,3)
for i3 (0,512)
PaddedInput = ...
for ry.0 (0,3)
for rc.0 (0,256)
for ff.2 (0,64)
for rx.1 (0,3)
for rc.1 (0,2)
Conv2dOutput = ...
for ax1.1 (0,7)
for ax2.1 (0,7)
vectorize ax3.1 (0,512)
T_relu = ...
==================================================