Hello,
I’m experiencing error during autoscheduling non-standard model (audio processing) and standard model(resnet-50), when using Android.
Autoscheduler problematic outputs:
my model:
| 3 | vm_mod_fused_nn_dense_5 | - | - | 64 |
| 4 | vm_mod_fused_nn_dense_3 | - | - | 512 |
resnet:
| 4 | vm_mod_fused_nn_dense_add | - | - | 6 |
Errors during this task:
my model:
No: 387 GFLOPS: 0.00 / 0.00 results: MeasureResult(error_type:InstantiationError, error_msg:Traceback (most recent call last):
File "/usr/tvm/python/tvm/auto_scheduler/measure.py", line 619, in _local_build_worker
sch, args = task.compute_dag.apply_steps_from_state(
File "/usr/tvm/python/tvm/auto_scheduler/compute_dag.py", line 154, in ap
...
rERKS2_EE10InitVTableEvENUlRKNS_7ru
0: tvm::auto_scheduler::IndexRewriter::VisitExpr_(tvm::tir::ProducerLoadNode const*)
File "/usr/tvm/src/auto_scheduler/compute_dag.cc", line 764
InternalError: Check failed: (name_it != name_to_arg.end()) is false:
, all_cost:0.40, Tstamp:1704357378.81)
==================================================
Placeholder: p0, p1
parallel i0.0@i1.0@i0.1@i1.1@ (0,32)
for k.0 (0,125)
for i0.2 (0,13)
for i1.2 (0,2)
for k.1 (0,2)
for i0.3 (0,5)
vectorize i1.3 (0,2)
T_matmul_NT = ...
==================================================
resnet
[09:07:21] /usr/tvm/src/auto_scheduler/compute_dag.cc:1377: Warning: InferBound fails on the state:
Placeholder: p0, p1, p2
parallel n.0@co.0@h.0@w.0@vh.0@vw.0@vc.0@ (0,14)
conv.local auto_unroll: 512
for n_c.0 (None)
for co_c.0 (None)
for h_c.0 (None)
for w_c.0 (None)
for vh_c.0 (None)
for vw_c.0 (None)
for vc_c.0 (None)
for n_c.1 (None)
for co_c.1 (None)
for n (None)
for h (None)
for w (None)
for ci (None)
for vh (None)
vectorize vw (None)
data_vec = ...
for h_c.1 (None)
for w_c.1 (None)
for i0 (None)
for i1 (None)
for i2 (None)
vectorize i3 (None)
PadInput = ...
for vh_c.1 (None)
for vw_c.1 (None)
for vc_c.1 (None)
for ci.0 (None)
for kh.0 (None)
for kw.0 (None)
for n_c.2 (None)
for co_c.2 (None)
for h_c.2 (None)
for w_c.2 (None)
for vh_c.2 (None)
for vw_c.2 (None)
for vc_c.2 (None)
for ci.1 (None)
for kh.1 (None)
for kw.1 (None)
for n_c.3 (None)
for co_c.3 (None)
for h_c.3 (None)
for w_c.3 (None)
for vh_c.3 (None)
for vw_c.3 (None)
vectorize vc_c.3 (None)
conv.local = ...
for co.1 (0,16)
for vw.1 (0,7)
vectorize vc.1 (0,16)
conv = ...
parallel n@co@ (0,512)
for h (0,7)
vectorize w (0,7)
output_unpack = ...
parallel ax0@ax1@ (0,512)
for ax2 (0,7)
vectorize ax3 (0,7)
T_relu = ...
with: [09:07:21] /usr/tvm/src/te/schedule/bound.cc:175: InternalError: Check failed: (found_attach || stage_attach.size() == 0) is false: Invalid Schedule, cannot find the producer compute(PadInput, body=[T.if_then_else(i2 >= 1 and i2 < 15 and i3 >= 1 and i3 < 15, p0[i0, i1, i2 - 1, i3 - 1], T.float32(0))], axis=[T.iter_var(i0, T.Range(0, 1), "DataPar", ""), T.iter_var(i1, T.Range(0, 512), "DataPar", ""), T.iter_var(i2, T.Range(0, 16), "DataPar", ""), T.iter_var(i3, T.Range(0, 16), "DataPar", "")], reduce_axis=[], tag=injective,pad, attrs={}) along the loop nest specified by compute_at of consumer compute(data_vec, body=[PadInput[n, ci, h * 2 + vh, w * 7 * 2 + vw]], axis=[T.iter_var(n, T.Range(0, 1), "DataPar", ""), T.iter_var(h, T.Range(0, 7), "DataPar", ""), T.iter_var(w, T.Range(0, 1), "DataPar", ""), T.iter_var(ci, T.Range(0, 512), "DataPar", ""), T.iter_var(vh, T.Range(0, 4), "DataPar", ""), T.iter_var(vw, T.Range(0, 16), "DataPar", "")], reduce_axis=[], tag=, attrs={})
Stack trace:
0: tvm::te::InferRootBound(tvm::te::Stage const&, tvm::te::GraphContext const&, std::unordered_map<tvm::tir::IterVar, tvm::Range, std::hash<tvm::tir::IterVar>, std::equal_to<tvm::tir::IterVar>, std::allocator<std::pair<tvm::tir::IterVar const, tvm::Range> > >*)
1: tvm::te::InferBound(tvm::te::Schedule const&)
2: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::auto_scheduler::State const&) const
3: tvm::auto_scheduler::ComputeDAG::InferBound(tvm::runtime::Array<tvm::auto_scheduler::State, void> const&) const::{lambda(int)#1}::operator()(int) const
4: _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EZNS1_11_Task_stateIZN3tvm7support12parallel_
5: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)
6: 0x00007f68bb4d9ee7
7: std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::packaged_task<void (std::vector<int, std::allocator<int> > const&, std::function<void (int)> const&)>, std::vector<int, std::allocator<int> >, std::function<void (int)> > > >::_M_run()
8: 0x00007f6837e7c252
9: 0x00007f68bb4d4ac2
10: 0x00007f68bb56665f
11: 0xffffffffffffffff
When autoscheduling both model I was using ndk compiler armv7a-linux-androideabi28-clang++
from NDK(from docker android_demo). And following target llvm -device=arm_cpu -mcpu=cortex-a73 -mtriple=armv7a-linux-android -mattr=+neon
.
–edit– runtime and rpc-server was compiled:
cmake ..
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_TOOLCHAIN_FILE=/opt/android-sdk-linux/ndk/21.3.6528147/build/cmake/android.toolchain.cmake
-DCMAKE_CXX_FLAGS_RELEASE=-O3
-DANDROID_ABI=armeabi-v7a
-DANDROID_PLATFORM=android-28
-DUSE_RPC=ON
-DUSE_CPP_RPC=ON
-DUSE_PROFILER=OFF
-DUSE_GRAPH_EXECUTOR=ON