I was trying to speed up my model running on Android device, but failed on the “Evaluating” step.
Firstly, I tested the tune_relay_arm.py scripts following the instruction. (The value of “n_trial” was modified to “1” because I just want to see if it works in a short time)
Tuning...
[Task 1/28] Current/Best: 2.21/ 2.21 GFLOPS | Progress: (1/1) | 1.15 s Done.
[Task 2/28] Current/Best: 3.09/ 3.09 GFLOPS | Progress: (1/1) | 1.09 s Done.
[Task 3/28] Current/Best: 16.69/ 16.69 GFLOPS | Progress: (1/1) | 1.13 s Done.
[Task 4/28] Current/Best: 1.59/ 1.59 GFLOPS | Progress: (1/1) | 1.45 s Done.
[Task 5/28] Current/Best: 6.57/ 6.57 GFLOPS | Progress: (1/1) | 1.00 s Done.
[Task 6/28] Current/Best: 3.51/ 3.51 GFLOPS | Progress: (1/1) | 1.09 s Done.
[Task 7/28] Current/Best: 15.27/ 15.27 GFLOPS | Progress: (1/1) | 0.82 s Done.
[Task 8/28] Current/Best: 3.67/ 3.67 GFLOPS | Progress: (1/1) | 1.08 s Done.
[Task 9/28] Current/Best: 11.11/ 11.11 GFLOPS | Progress: (1/1) | 1.26 s Done.
[Task 10/28] Current/Best: 16.60/ 16.60 GFLOPS | Progress: (1/1) | 1.11 s Done.
[Task 11/28] Current/Best: 25.53/ 25.53 GFLOPS | Progress: (1/1) | 2.65 s Done.
[Task 12/28] Current/Best: 18.67/ 18.67 GFLOPS | Progress: (1/1) | 3.23 s Done.
[Task 13/28] Current/Best: 13.43/ 13.43 GFLOPS | Progress: (1/1) | 1.67 s Done.
[Task 14/28] Current/Best: 23.99/ 23.99 GFLOPS | Progress: (1/1) | 0.88 s Done.
[Task 15/28] Current/Best: 5.25/ 5.25 GFLOPS | Progress: (1/1) | 1.71 s Done.
[Task 16/28] Current/Best: 23.17/ 23.17 GFLOPS | Progress: (1/1) | 2.63 s Done.
[Task 17/28] Current/Best: 26.99/ 26.99 GFLOPS | Progress: (1/1) | 1.11 s Done.
[Task 18/28] Current/Best: 11.99/ 11.99 GFLOPS | Progress: (1/1) | 1.41 s Done.
[Task 19/28] Current/Best: 9.74/ 9.74 GFLOPS | Progress: (1/1) | 0.95 s Done.
[Task 20/28] Current/Best: 11.01/ 11.01 GFLOPS | Progress: (1/1) | 2.07 s Done.
[Task 21/28] Current/Best: 22.91/ 22.91 GFLOPS | Progress: (1/1) | 2.91 s Done.
[Task 22/28] Current/Best: 6.02/ 6.02 GFLOPS | Progress: (1/1) | 2.94 s Done.
[Task 23/28] Current/Best: 5.32/ 5.32 GFLOPS | Progress: (1/1) | 2.58 s Done.
[Task 24/28] Current/Best: 12.52/ 12.52 GFLOPS | Progress: (1/1) | 1.05 s Done.
[Task 25/28] Current/Best: 5.78/ 5.78 GFLOPS | Progress: (1/1) | 2.29 s Done.
[Task 26/28] Current/Best: 19.51/ 19.51 GFLOPS | Progress: (1/1) | 3.51 s Done.
[Task 27/28] Current/Best: 8.36/ 8.36 GFLOPS | Progress: (1/1) | 1.45 s Done.
[Task 28/28] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (1/1) | 10.12 s Done.
Compile...
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -mtriple=aarch64-linux-android, workload=('dense_nopack.x86', ('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
Upload...
Evaluate inference time cost...
Mean inference time (std dev): 165.64 ms (2.00 ms)
Then I redefined the function get_network(name, batch_size)
:
def get_network(network_path):
input_shape = (1, 1, 125, 125)
output_shape = (1, 1, 484, 484)
input_name = "input.1"# This is the name of the network's input
shape_dict = {input_name: input_shape}
dtype_dict = {input_name: "float32"}
try:
onnx_model = onnx.load(network_path)
# print(onnx_model)
except:
raise ValueError("Unsupported network: " + network_path)
mod, params = relay.frontend.from_onnx(onnx_model,shape=shape_dict, dtype=dtype_dict)#,freeze_params=False)
return mod, params, input_shape, output_shape
After that, errors occurred as below :
Extract tasks...
Tuning...
[Task 1/ 9] Current/Best: 3.79/ 3.79 GFLOPS | Progress: (1/1) | 1.41 s Done.
[Task 2/ 9] Current/Best: 1.93/ 1.93 GFLOPS | Progress: (1/1) | 0.99 s Done.
[Task 3/ 9] Current/Best: 0.98/ 0.98 GFLOPS | Progress: (1/1) | 1.32 s Done.
[Task 4/ 9] Current/Best: 0.82/ 0.82 GFLOPS | Progress: (1/1) | 1.51 s Done.
[Task 5/ 9] Current/Best: 0.87/ 0.87 GFLOPS | Progress: (1/1) | 1.81 s Done.
[Task 6/ 9] Current/Best: 16.13/ 16.13 GFLOPS | Progress: (1/1) | 0.88 s Done.
[Task 7/ 9] Current/Best: 0.89/ 0.89 GFLOPS | Progress: (1/1) | 1.05 s Done.
[Task 8/ 9] Current/Best: 3.90/ 3.90 GFLOPS | Progress: (1/1) | 0.97 s Done.
[Task 9/ 9] Current/Best: 2.96/ 2.96 GFLOPS | Progress: (1/1) | 0.85 s Done.
Compile...
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -mtriple=aarch64-linux-android, workload=('conv2d_transpose_nchw.arm_cpu', ('TENSOR', (1, 32, 121, 121), 'float32'), ('TENSOR', (32, 1, 9, 9), 'float32'), (4, 4), (3, 3, 3, 3), 'float32', (1, 1)). A fallback configuration is used, which may bring great performance regression.
Upload...
(1, 1, 125, 125)
Evaluate inference time cost...
Traceback (most recent call last):
File "Autotuningplay.py", line 184, in <module>
tune_and_evaluate(tuning_option)
File "Autotuningplay.py", line 173, in tune_and_evaluate
prof_res = np.array(ftimer().results) * 1000 # convert to millisecond
File "/home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/python/tvm/runtime/module.py", line 226, in evaluator
blob = feval(*args)
File "/home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (5) /home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/build/libtvm.so(TVMFuncCall+0x65) [0x7ff8882659a5]
[bt] (4) /home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/build/libtvm.so(+0x115a063) [0x7ff8882df063]
[bt] (3) /home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x3c5) [0x7ff8882e2255]
[bt] (2) /home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x57) [0x7ff8882d4657]
[bt] (1) /home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x2a3) [0x7ff8882cc1e3]
[bt] (0) /home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/build/libtvm.so(+0x1144182) [0x7ff8882c9182]
File "/home/wenjun/Documents/Dev/apache-tvm-src-v0.7.0.rc0-incubating/src/runtime/rpc/rpc_endpoint.cc", line 807
TVMError: Check failed: code == RPCCode: :kReturn: code=1
Is there any difference between “mod, params” generated by relay.frontend and those generated by relay.testing?