Hi there,
I try to autotune my model using android_rpc on my android phone (SnapDragon 845).
My backend is OpenCL. Here is my opencl device info:
- PlatformName: QUALCOMM Snapdragon™
- Device: QUALCOMM Adreno™
1.1 Hardware version: OpenCL 2.0 Adreno™ 630
1.2 Software version: OpenCL 2.0 QUALCOMM build: commit #78d547b changeid #I4ca2995ce0 Date: 04/11/18 Wed Local Branch: Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.UM.6.3.R1.08.00.00.301.091 Compiler E031.35.02.06
1.3 OpenCL C version: OpenCL C 2.0 Adreno™ 630
1.4 Parallel compute units: 2
I successfully using android rpc to compile and run my model, but the inference speed is very slow (~4secs), so I try to use autotune.
But after some initial run, I always see " 0.00/ 0.00 GFLOPS". With debug log on, showing 3 type of errors: error_no=1, 4, 7, like the following
DEBUG:autotvm:No: 4 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(RuntimeError(‘Except caught from RPC call: [14:34:04] /tvm/apps/android_rpc/app/src/main/jni/…/…/…/…/…/…/include/…/src/runtime/module_util.cc:53: Check failed: ret == 0 (-1 vs. 0) [14:34:04]
/tvm/apps/android_rpc/app/src/main/jni/…/…/…/…/…/…/include/…/src/runtime/opencl/opencl_module.cc:216: OpenCL build error for device=0x6fb89f2068Pass’,),), error_no=4, all_cost=2.9775819778442383, timestamp=1550039644.957301) [(‘tile_b’, [16, 1, 1, 1]), (‘tile_y’, [4, 8, 4, 2]), (‘tile_x’, [2, 14, 4, 21]), (‘tile_rc’, [256, 1]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 1)],winograd,None,6412458
DEBUG:autotvm:No: 13 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(’’,), error_no=7, all_cost=5, timestamp=1550039770.987435) [(‘tile_b’, [16, 1, 1, 1]), (‘tile_y’, [8, 1, 1, 32]), (‘tile_x’, [6, 4, 7, 14]), (‘tile_rc’, [256, 1]), (‘auto_unroll_max_step’, 128), (‘unroll_explicit’, 0)],winograd,None,2241835
DEBUG:autotvm:No: 24 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(InstantiationError(‘Skipped because of invalid gpu kernel’,),), error_no=1, all_cost=0.045018911361694336, timestamp=1550039771.325883) [(‘tile_b’, [16, 1, 1, 1]), (‘tile_y’, [8, 4, 8, 1]), (‘tile_x’, [1, 168, 7, 2]), (‘tile_rc’, [4, 64]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 0)],winograd,None,1445426
My target and host setting are:
target = ‘opencl’
target_host = "llvm -target=“arm64-linux-android”
Any help?
Additional info:
If I comment out the tuning process, and directly go to compile, upload and evaluate time cost stage. The time cost can be calculated correctly.
If I modify target to a similar one ‘opencl -device mali’, the code also can be run correctly without auto tune.
But all the error are only no.7 left.
DEBUG:autotvm:No: 24 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(’’,), error_no=7, all_cost=5, timestamp=1550056627.234274) [(‘tile_bna’, 2), (‘tile_bnb’, 2), (‘tile_t1’, [256, 1]), (‘tile_t2’, [128, 2]), (‘c_unroll’, [32, 8]), (‘yt’, 32)],winograd,None,37006
I also found out many times that android rpc will go back to its main page, and then go back again to stop_rpc page.