AutoScheduler stuck on file tutorial provides

In my first attempt to get the AutoScheduler working, it seems to be stuck on the file the tutorial provides.
I downloaded tune_network_x86.py from Auto-scheduling a Neural Network for x86 CPU — tvm 0.8.dev0 documentation to run on my Intel machine (which supports AVX2).
I uncommented # run_tuning()
I executed the Python file, but it seems to be stuck at task 2 (reported Speed also remains at 0). See snippet of AutoScheduler’s output below.
I changed network = "resnet-50" to network = "mobilenet" and again executed the Python file, and again the AutoScheduler remains stuck at task 2.
Help is appreciated, to fix the issue, or otherwise get the AutoScheduler to work.
Seemingly related post: [AutoTuning] How to debug when all trials are failing on GPU I am running an up-to-date Ubuntu and installed TVM-nightly at June 24.

==================================================
No: 198 GFLOPS: 0.00 / 0.00     results: MeasureResult(error_type:RunTimeoutError, error_msg:, all_cost:23.01, Tstamp:1628088581.33)
==================================================
Placeholder: placeholder, placeholder, placeholder
parallel p.0 (0,49)
  for ci.0 (0,2)
    for eps (0,6)
      for nu (0,6)
        for ci ((ci.outer*64),64)
          data_pad = ...
          input_tile = ...
    for ci.1 (0,64)
      unroll eps (0,6)
        unroll nu (0,6)
          unroll r_a (0,6)
            unroll r_b (0,6)
              data_pack = ...
bgemm.local auto_unroll: 16
for nu_c.1 (0,6)
  for p_c.1 (0,7)
    for co_c.1 (0,16)
      for ci.0 (0,128)
        for eps_c.2 (0,6)
          for co_c.2 (0,8)
            for p_c.3 (0,7)
              bgemm.local = ...
for eps.1 (0,6)
  for nu.1 (0,6)
    for p.1 (0,49)
      for co.1 (0,128)
        bgemm = ...
inverse auto_unroll: 64
parallel p.0@co.0@p.1@ (0,196)
  for co.1 (0,32)
    unroll vh (0,4)
      unroll vw (0,4)
        unroll r_a (0,6)
          unroll r_b (0,6)
            inverse = ...
parallel ax0@ax1@ (0,28)
  for w (0,28)
    for co (0,128)
      conv2d_winograd = ...
  for ax2 (0,28)
    for ax3 (0,128)
      T_relu = ...

[16:49:41] /workspace/tvm/src/auto_scheduler/measure.cc:299: Warning: Too many errors happened during tuning. Switching to debug mode.

Time elapsed for measurement: 127.06 s
----------------------------------------------------------------------
------------------------------  [ Train cost model ]
----------------------------------------------------------------------
Time elapsed for training: 0.57 s
----------------------------------------------------------------------
------------------------------  [ Task Scheduler ]
----------------------------------------------------------------------
|  ID  | Latency (ms) | Speed (GFLOPS) | Trials |
-------------------------------------------------
|    0 |        0.017 |           0.23 |      6 |
|    1 |        0.512 |           8.00 |      6 |
|    2 |        0.020 |          -0.00 |      6 |
|    3 |            - |              - |      6 |
|    4 |            - |              - |      6 |
|    5 |            - |              - |      6 |
|    6 |            - |              - |      6 |
|    7 |            - |              - |      6 |
|    8 |            - |              - |      6 |
|    9 |            - |              - |     12 |
|   10 |            - |              - |     12 |
|   11 |            - |              - |     12 |
|   12 |            - |              - |      6 |
|   13 |            - |              - |      6 |
|   14 |            - |              - |     12 |
|   15 |            - |              - |      6 |
|   16 |            - |              - |      6 |
|   17 |            - |              - |      6 |
|   18 |            - |              - |      6 |
|   19 |            - |              - |      6 |
|   20 |            - |              - |      6 |
|   21 |            - |              - |      6 |
|   22 |            - |              - |      6 |
|   23 |            - |              - |      6 |
|   24 |            - |              - |      6 |
|   25 |            - |              - |      6 |
|   26 |            - |              - |      6 |
|   27 |            - |              - |      6 |
|   28 |            - |              - |      6 |
-------------------------------------------------
Estimated total latency: - ms   Trials: 198     Used time : 5081 s      Next ID: 4
----------------------------------------------------------------------
------------------------------  [ Search ]
----------------------------------------------------------------------
Sample Initial Population       #s: 1618        fail_ct: 0      Time elapsed: 2.33
GA Iter: 0      Max score: 1.2978       Min score: 1.2408       #Pop: 12        #M+: 0  #M-: 0
GA Iter: 4      Max score: 1.4138       Min score: 1.3630       #Pop: 12        #M+: 1392       #M-: 38
EvolutionarySearch              #s: 12  Time elapsed: 11.76
----------------------------------------------------------------------
------------------------------  [ Measure ]
----------------------------------------------------------------------
Get 6 programs to measure:
...T...T.T.T*T*T
==================================================
No: 199 GFLOPS: 0.00 / 0.00     results: MeasureResult(error_type:BuildTimeoutError, error_msg:, all_cost:15.00, Tstamp:1628088655.35)
==================================================
Placeholder: placeholder, placeholder, placeholder
Conv2dOutput auto_unroll: 64
for yy.1 (0,7)
  for xx.1 (0,7)
    for ff.1 (0,8)
      for i1 (yy.outer.outer.inner,3)
        for i2 (xx.outer.outer.inner,3)
          for i3 (0,512)
            PaddedInput = ...
      for ry.0 (0,3)
        for rc.0 (0,256)
          for ff.2 (0,64)
            for rx.1 (0,3)
              for rc.1 (0,2)
                Conv2dOutput = ...
for ax1.1 (0,7)
  for ax2.1 (0,7)
    vectorize ax3.1 (0,512)
      T_relu = ...

==================================================

Your log shows “error_type:RunTimeoutError”, so you need to first enlarge the timeout to see if that helps.

I’m running on a fairly powerful machine (Intel(R) Xeon(R) CPU E3-1271), so that would mean it would also timeout on slower machine of other users.
Still, I now set timeout=20 of LocalRunner (default is 10) (my only change to the code).
AutoScheduler is now no longer stuck on task 2, but it still yields a Speed=0 for task 2. See print below.
Secondly, task 9 seems to be stuck now on “Get 6 programs to measure”. It’s running for approximately 20 hours, whereas the other tasks only required several minutes.
I hit Ctrl+c and appended the subsequent stack trace to this post.
Guidance to solve these issues is appreciated.

Stack trace first issue (Speed=0 with task 2):

----------------------------------------------------------------------
------------------------------  [ Search ]
----------------------------------------------------------------------
Generate Sketches               #s: 1
Sample Iter: 5  #Pop: 8 #Target: 50     fail_ct: 7723   Time elapsed: 2.97
#Target has been reduced to 25 due to too many failures or duplications
Sample Iter: 10 #Pop: 8 #Target: 25     fail_ct: 15402  Time elapsed: 5.75
#Target has been reduced to 12 due to too many failures or duplications
Sample Iter: 15 #Pop: 8 #Target: 12     fail_ct: 23111  Time elapsed: 8.62
#Target has been reduced to 6 due to too many failures or duplications
Sample Initial Population       #s: 8   fail_ct: 24671  Time elapsed: 9.30
GA Iter: 0      Max score: 0.9553       Min score: 0.0362       #Pop: 8 #M+: 0  #M-: 0
GA Iter: 4      Max score: 0.9999       Min score: 0.9896       #Pop: 12        #M+: 363        #M-: 6932
EvolutionarySearch              #s: 12  Time elapsed: 2.32
----------------------------------------------------------------------
------------------------------  [ Measure ]
----------------------------------------------------------------------
Get 6 programs to measure:
......******
Time elapsed for measurement: 107.95 s
----------------------------------------------------------------------
------------------------------  [ Train cost model ]
----------------------------------------------------------------------
Time elapsed for training: 0.14 s
----------------------------------------------------------------------
------------------------------  [ Task Scheduler ]
----------------------------------------------------------------------
|  ID  | Latency (ms) | Speed (GFLOPS) | Trials |
-------------------------------------------------
|    0 |        0.010 |           0.42 |      6 |
|    1 |        0.528 |           7.76 |      6 |
|    2 |        0.018 |          -0.00 |      6 |
|    3 |            - |              - |      0 |
|    4 |            - |              - |      0 |
|    5 |            - |              - |      0 |
|    6 |            - |              - |      0 |
|    7 |            - |              - |      0 |
|    8 |            - |              - |      0 |
|    9 |            - |              - |      0 |
|   10 |            - |              - |      0 |
|   11 |            - |              - |      0 |
|   12 |            - |              - |      0 |
|   13 |            - |              - |      0 |
|   14 |            - |              - |      0 |
|   15 |            - |              - |      0 |
|   16 |            - |              - |      0 |
|   17 |            - |              - |      0 |
|   18 |            - |              - |      0 |
|   19 |            - |              - |      0 |
|   20 |            - |              - |      0 |
|   21 |            - |              - |      0 |
|   22 |            - |              - |      0 |
|   23 |            - |              - |      0 |
|   24 |            - |              - |      0 |
|   25 |            - |              - |      0 |
|   26 |            - |              - |      0 |
|   27 |            - |              - |      0 |
|   28 |            - |              - |      0 |
-------------------------------------------------
Estimated total latency: - ms   Trials: 18      Used time : 348 s       Next ID: 3

Stack trace second issue (task 9 stuck):

----------------------------------------------------------------------
------------------------------  [ Task Scheduler ]
----------------------------------------------------------------------
|  ID  | Latency (ms) | Speed (GFLOPS) | Trials |
-------------------------------------------------
|    0 |        0.010 |           0.42 |      6 |
|    1 |        0.528 |           7.76 |      6 |
|    2 |        0.018 |          -0.00 |      6 |
|    3 |        1.724 |          59.83 |      6 |
|    4 |        3.384 |          68.35 |      6 |
|    5 |        1.611 |          63.80 |      6 |
|    6 |        1.887 |          54.52 |      6 |
|    7 |        0.767 |          67.03 |      6 |
|    8 |        1.979 |          52.23 |      6 |
|    9 |            - |              - |      0 |
|   10 |            - |              - |      0 |
|   11 |            - |              - |      0 |
|   12 |            - |              - |      0 |
|   13 |            - |              - |      0 |
|   14 |            - |              - |      0 |
|   15 |            - |              - |      0 |
|   16 |            - |              - |      0 |
|   17 |            - |              - |      0 |
|   18 |            - |              - |      0 |
|   19 |            - |              - |      0 |
|   20 |            - |              - |      0 |
|   21 |            - |              - |      0 |
|   22 |            - |              - |      0 |
|   23 |            - |              - |      0 |
|   24 |            - |              - |      0 |
|   25 |            - |              - |      0 |
|   26 |            - |              - |      0 |
|   27 |            - |              - |      0 |
|   28 |            - |              - |      0 |
-------------------------------------------------
Estimated total latency: - ms   Trials: 54      Used time : 75952 s     Next ID: 9
----------------------------------------------------------------------
------------------------------  [ Search ]
----------------------------------------------------------------------
Generate Sketches               #s: 3
Sample Initial Population       #s: 1715        fail_ct: 70     Time elapsed: 7.32
GA Iter: 0      Max score: 0.9996       Min score: 0.9956       #Pop: 12        #M+: 0  #M-: 0
[16:18:35] /workspace/tvm/src/auto_scheduler/compute_dag.cc:1371: Warning: InferBound fails on the state:
Placeholder: placeholder, placeholder, placeholder
data_pack auto_unroll: 512
parallel p.0@ci.0@ (0,64)
  for eps (None)
    for nu (None)
      for p (None)
        vectorize ci (None)
          input_tile = ...
  for p.1 (0,16)
    for i0 (None)
      for i1 (None)
        for i2 (None)
          vectorize i3 (None)
            data_pad = ...
    for ci.1 (0,4)
      unroll eps (0,6)
        unroll nu (0,6)
          unroll r_a (0,6)
            unroll r_b (0,6)
              data_pack = ...
bgemm auto_unroll: 512
parallel eps.0@nu.0@p.0@co.0@eps.1@nu.1@p.1@ (0,48)
  for co.1 (0,64)
    for ci.0 (0,128)
      for nu.2 (0,2)
        for p.2 (0,2)
          for co.2 (0,2)
            for ci.1 (0,2)
              for nu.3 (0,3)
                for p.3 (0,2)
                  bgemm = ...
inverse auto_unroll: 64
parallel p.0@co.0@ (0,256)
  for p.1 (0,2)
    for co.1 (0,8)
      unroll vh (0,4)
        unroll vw (0,4)
          unroll r_a (0,6)
            unroll r_b (0,6)
              inverse = ...
parallel ax0@ax1@ax2@ (0,196)
  for n (None)
    for h (None)
      for w (None)
        for co (None)
          conv2d_winograd = ...
  for ax3 (0,256)
    T_relu = ...

with: [16:18:35] /workspace/tvm/src/te/schedule/bound.cc:175:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (found_attach || stage_attach.size() == 0) is false: Invalid Schedule, cannot find the producer compute(data_pad, body=[tir.if_then_else(((((i1 >= 1) && (i1 < 15)) && (i2 >= 1)) && (i2 < 15)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)], axis=[iter_var(i0, range(min=0, ext=1)), iter_var(i1, range(min=0, ext=18)), iter_var(i2, range(min=0, ext=18)), iter_var(i3, range(min=0, ext=256))], reduce_axis=[], tag=injective,pad, attrs={}) along the loop nest specified by compute_at of consumer compute(input_tile, body=[data_pad[floordiv(p, 16), ((floormod(floordiv(p, 4), 4)*4) + eps), ((floormod(p, 4)*4) + nu), ci]], axis=[iter_var(eps, range(min=0, ext=6)), iter_var(nu, range(min=0, ext=6)), iter_var(p, range(min=0, ext=16)), iter_var(ci, range(min=0, ext=256))], reduce_axis=[], tag=, attrs={})
Stack trace:
  0: tvm::te::InferRootBound(tvm::te::Stage const&, tvm::te::GraphContext const&, std::unordered_map<tvm::tir::IterVar, tvm::Range, std::hash<tvm::tir::IterVar>, std::equal_to<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > >*)
  1: tvm::te::InferBound(tvm::te::Schedule const&)
  2: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::auto_scheduler::State const&) const
  3: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::runtime::Array<tvm::auto_scheduler::State, void> const&) const::{lambda(int)#1}::operator()(int) const
  4: _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_1
  5: _ZNSt13__future_base13_State_baseV29_M_do_setEPSt8functionIFSt10unique_ptrINS_12_Result_baseEN
  6: __pthread_once_slow
        at ./nptl/pthread_once.c:116
  7: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJ
  8: std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::packaged_task<void (std::vector<int, std::allocator<int> > const&, std::function<void (int)> const&)>, std::vector<int, std::allocator<int> >, std::function<void (int)> > > >::_M_run()
  9: execute_native_thread_routine
  10: start_thread
        at ./nptl/pthread_create.c:473
  11: clone
  12: 0xffffffffffffffff



[16:18:56] /workspace/tvm/src/auto_scheduler/compute_dag.cc:1371: Warning: InferBound fails on the state:
Placeholder: placeholder, placeholder, placeholder
data_pack auto_unroll: 16
parallel p.0@ci.0@ (0,64)
  for eps (None)
    for nu (None)
      for p (None)
        vectorize ci (None)
          input_tile = ...
  for p.1 (0,16)
    for i0 (None)
      for i1 (None)
        for i2 (None)
          vectorize i3 (None)
            data_pad = ...
    for ci.1 (0,4)
      unroll eps (0,6)
        unroll nu (0,6)
          unroll r_a (0,6)
            unroll r_b (0,6)
              data_pack = ...
parallel eps.0@nu.0@p.0@co.0@ (0,16)
  for eps_c.0 (None)
    for nu_c.0 (None)
      for p_c.0 (None)
        for co_c.0 (None)
          for eps_c.1 (None)
            for nu_c.1 (None)
              for p_c.1 (None)
                for co_c.1 (None)
                  for ci.0 (None)
                    for eps_c.2 (None)
                      for nu_c.2 (None)
                        for p_c.2 (None)
                          for co_c.2 (None)
                            for ci.1 (None)
                              for eps_c.3 (None)
                                for nu_c.3 (None)
                                  for p_c.3 (None)
                                    vectorize co_c.3 (None)
                                      bgemm.local = ...
  for eps.1 (0,6)
    for nu.1 (0,6)
      for p.1 (0,4)
        for co.1 (0,64)
          bgemm = ...
inverse auto_unroll: 16
parallel p.0@co.0@ (0,256)
  for co.1 (0,16)
    unroll vh (0,4)
      unroll vw (0,4)
        unroll r_a (0,6)
          unroll r_b (0,6)
            inverse = ...
parallel n@h@w@ (None)
  for co (None)
    conv2d_winograd = ...
parallel ax0@ax1@ax2@ (0,196)
  for ax3 (0,256)
    T_relu = ...

with: [16:18:56] /workspace/tvm/src/te/schedule/bound.cc:175:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (found_attach || stage_attach.size() == 0) is false: Invalid Schedule, cannot find the producer compute(data_pad, body=[tir.if_then_else(((((i1 >= 1) && (i1 < 15)) && (i2 >= 1)) && (i2 < 15)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)], axis=[iter_var(i0, range(min=0, ext=1)), iter_var(i1, range(min=0, ext=18)), iter_var(i2, range(min=0, ext=18)), iter_var(i3, range(min=0, ext=256))], reduce_axis=[], tag=injective,pad, attrs={}) along the loop nest specified by compute_at of consumer compute(input_tile, body=[data_pad[floordiv(p, 16), ((floormod(floordiv(p, 4), 4)*4) + eps), ((floormod(p, 4)*4) + nu), ci]], axis=[iter_var(eps, range(min=0, ext=6)), iter_var(nu, range(min=0, ext=6)), iter_var(p, range(min=0, ext=16)), iter_var(ci, range(min=0, ext=256))], reduce_axis=[], tag=, attrs={})
Stack trace:
  0: tvm::te::InferRootBound(tvm::te::Stage const&, tvm::te::GraphContext const&, std::unordered_map<tvm::tir::IterVar, tvm::Range, std::hash<tvm::tir::IterVar>, std::equal_to<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > >*)
  1: tvm::te::InferBound(tvm::te::Schedule const&)
  2: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::auto_scheduler::State const&) const
  3: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::runtime::Array<tvm::auto_scheduler::State, void> const&) const::{lambda(int)#1}::operator()(int) const
  4: _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_1
  5: _ZNSt13__future_base13_State_baseV29_M_do_setEPSt8functionIFSt10unique_ptrINS_12_Result_baseEN
  6: __pthread_once_slow
        at ./nptl/pthread_once.c:116
  7: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJ
  8: std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::packaged_task<void (std::vector<int, std::allocator<int> > const&, std::function<void (int)> const&)>, std::vector<int, std::allocator<int> >, std::function<void (int)> > > >::_M_run()
  9: execute_native_thread_routine
  10: start_thread
        at ./nptl/pthread_create.c:473
  11: clone
  12: 0xffffffffffffffff

It is expected (although it does look confusing…) As you can see, although the speed of task 2 is -0.00, it has a valid latency (0.018 ms). The reason of displaying -0.00 GFLOPS is because task 2 has too few FLOPS, so GFLOPS=FLOPS/latency shows a weird look. Anyways, as long as you can see a valid latency, you’re good with this task.

Ok, then the first issue is cleared up, thanks.

The second issue still remains: AutoScheduler stuck at task 9. I looked more closely at the terminal output. It starts with Warning: InferBound fails on the state and subsequently Check failed: (found_attach || stage_attach.size() == 0) is false: Invalid Schedule. How to deal with this?