Vm_mod_fused_nn_dense* on Android


I’m experiencing error during autoscheduling non-standard model (audio processing) and standard model(resnet-50), when using Android.

Autoscheduler problematic outputs:

my model:
|    3 |              vm_mod_fused_nn_dense_5 |        - |       - |     64 |
|    4 |              vm_mod_fused_nn_dense_3 |        - |       - |    512 |

|    4 |              vm_mod_fused_nn_dense_add |      - |       - |      6 |

Errors during this task:

my model:
No: 387 GFLOPS: 0.00 / 0.00     results: MeasureResult(error_type:InstantiationError, error_msg:Traceback (most recent call last):
  File "/usr/tvm/python/tvm/auto_scheduler/measure.py", line 619, in _local_build_worker
    sch, args = task.compute_dag.apply_steps_from_state(
  File "/usr/tvm/python/tvm/auto_scheduler/compute_dag.py", line 154, in ap
  0: tvm::auto_scheduler::IndexRewriter::VisitExpr_(tvm::tir::ProducerLoadNode const*)
  File "/usr/tvm/src/auto_scheduler/compute_dag.cc", line 764
InternalError: Check failed: (name_it != name_to_arg.end()) is false: 
, all_cost:0.40, Tstamp:1704357378.81)
Placeholder: p0, p1
parallel i0.0@i1.0@i0.1@i1.1@ (0,32)
  for k.0 (0,125)
    for i0.2 (0,13)
      for i1.2 (0,2)
        for k.1 (0,2)
          for i0.3 (0,5)
            vectorize i1.3 (0,2)
              T_matmul_NT = ...

[09:07:21] /usr/tvm/src/auto_scheduler/compute_dag.cc:1377: Warning: InferBound fails on the state:
Placeholder: p0, p1, p2
parallel n.0@co.0@h.0@w.0@vh.0@vw.0@vc.0@ (0,14)
  conv.local auto_unroll: 512
  for n_c.0 (None)
    for co_c.0 (None)
      for h_c.0 (None)
        for w_c.0 (None)
          for vh_c.0 (None)
            for vw_c.0 (None)
              for vc_c.0 (None)
                for n_c.1 (None)
                  for co_c.1 (None)
                    for n (None)
                      for h (None)
                        for w (None)
                          for ci (None)
                            for vh (None)
                              vectorize vw (None)
                                data_vec = ...
                    for h_c.1 (None)
                      for w_c.1 (None)
                        for i0 (None)
                          for i1 (None)
                            for i2 (None)
                              vectorize i3 (None)
                                PadInput = ...
                        for vh_c.1 (None)
                          for vw_c.1 (None)
                            for vc_c.1 (None)
                              for ci.0 (None)
                                for kh.0 (None)
                                  for kw.0 (None)
                                    for n_c.2 (None)
                                      for co_c.2 (None)
                                        for h_c.2 (None)
                                          for w_c.2 (None)
                                            for vh_c.2 (None)
                                              for vw_c.2 (None)
                                                for vc_c.2 (None)
                                                  for ci.1 (None)
                                                    for kh.1 (None)
                                                      for kw.1 (None)
                                                        for n_c.3 (None)
                                                          for co_c.3 (None)
                                                            for h_c.3 (None)
                                                              for w_c.3 (None)
                                                                for vh_c.3 (None)
                                                                  for vw_c.3 (None)
                                                                    vectorize vc_c.3 (None)
                                                                      conv.local = ...
  for co.1 (0,16)
    for vw.1 (0,7)
      vectorize vc.1 (0,16)
        conv = ...
parallel n@co@ (0,512)
  for h (0,7)
    vectorize w (0,7)
      output_unpack = ...
parallel ax0@ax1@ (0,512)
  for ax2 (0,7)
    vectorize ax3 (0,7)
      T_relu = ...

with: [09:07:21] /usr/tvm/src/te/schedule/bound.cc:175: InternalError: Check failed: (found_attach || stage_attach.size() == 0) is false: Invalid Schedule, cannot find the producer compute(PadInput, body=[T.if_then_else(i2 >= 1 and i2 < 15 and i3 >= 1 and i3 < 15, p0[i0, i1, i2 - 1, i3 - 1], T.float32(0))], axis=[T.iter_var(i0, T.Range(0, 1), "DataPar", ""), T.iter_var(i1, T.Range(0, 512), "DataPar", ""), T.iter_var(i2, T.Range(0, 16), "DataPar", ""), T.iter_var(i3, T.Range(0, 16), "DataPar", "")], reduce_axis=[], tag=injective,pad, attrs={}) along the loop nest specified by compute_at of consumer compute(data_vec, body=[PadInput[n, ci, h * 2 + vh, w * 7 * 2 + vw]], axis=[T.iter_var(n, T.Range(0, 1), "DataPar", ""), T.iter_var(h, T.Range(0, 7), "DataPar", ""), T.iter_var(w, T.Range(0, 1), "DataPar", ""), T.iter_var(ci, T.Range(0, 512), "DataPar", ""), T.iter_var(vh, T.Range(0, 4), "DataPar", ""), T.iter_var(vw, T.Range(0, 16), "DataPar", "")], reduce_axis=[], tag=, attrs={})
Stack trace:
  0: tvm::te::InferRootBound(tvm::te::Stage const&, tvm::te::GraphContext const&, std::unordered_map<tvm::tir::IterVar, tvm::Range, std::hash<tvm::tir::IterVar>, std::equal_to<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > >*)
  1: tvm::te::InferBound(tvm::te::Schedule const&)
  2: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::auto_scheduler::State const&) const
  3: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::runtime::Array<tvm::auto_scheduler::State, void> const&) const::{lambda(int)#1}::operator()(int) const
  4: _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_11_Task_stateIZN3tvm7support12parallel_
  5: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)
  6: 0x00007f68bb4d9ee7
  7: std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::packaged_task<void (std::vector<int, std::allocator<int> > const&, std::function<void (int)> const&)>, std::vector<int, std::allocator<int> >, std::function<void (int)> > > >::_M_run()
  8: 0x00007f6837e7c252
  9: 0x00007f68bb4d4ac2
  10: 0x00007f68bb56665f
  11: 0xffffffffffffffff

When autoscheduling both model I was using ndk compiler armv7a-linux-androideabi28-clang++ from NDK(from docker android_demo). And following target llvm -device=arm_cpu -mcpu=cortex-a73 -mtriple=armv7a-linux-android -mattr=+neon.

–edit– runtime and rpc-server was compiled:

            cmake ..

Hello, I also encountered the same problem, how did you solve it?

Hello @MrJungle1, I didn’t solve the problem. I also switched to other problems :confused:

This is so bad I’m stuck here

Rule of thumb is you should be hacker to use Apache TVM in complex scenarios. And when you are hacker you have to get know how internals works. Then you have knowlgde to solve error/problems inside TVM.

hhhh,moreover, you have asked this question for a long time and no one from the community has responded.

I have asked many questions on the forum. Mostly of them I answered :stuck_out_tongue:

Hello, has your problem been resolved?

TVM does not support schedule tuning for dense operators, but you can modify your model by changing the dense operator to the batch_matmul operator.

I made a pr fixing this issue last year and apparently nobody have any interest merging it ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯