I’m trying to run gluon resnet18 model on Android/Linux armv7 using relayvm runtime.
After the compilation I got two files - model.ro and model.so
Model runner inits the model with the following code: (the last statement fails)
std::string model_lib_path = "./model.so";
std::string model_ro_path = "./model.ro";
std::string code_data = LoadFileToString(model_ro_path, std::ios::in | std::ios::binary);
tvm::runtime::Module model_lib = tvm::runtime::Module::LoadFromFile(model_lib_path);
tvm::runtime::Module vm_executable_ = tvm::runtime::vm::Executable::Load(code_data, model_lib);
auto vm = tvm::runtime::make_object<tvm::runtime::vm::VirtualMachine>();
vm->LoadExecutable(static_cast<tvm::runtime::vm::Executable*>(
const_cast<tvm::runtime::Object*>(vm_executable_.get())));
Error:
terminating with uncaught exception of type dmlc::Error: [17:24:02] /home/pivovaa/workplace/tvm/src/runtime/vm/vm.cc:281: Check failed: pf != nullptr: Cannot find function in module:
/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "terminating with uncaught exception of type dmlc::Error: [17:24:02] /home/pivovaa/workplace/tvm/src/runtime/vm/vm.cc:281: Check failed: pf != nullptr: Cannot find function in module: " failed
Aborted
Error Line:
src/runtime/vm/vm.cc:281
vm.cc:281 for model compiled for Android/Linux armv7a shows the following debug info:
exec_->primitive_map
size is 1.
packed_name
string size is 1.
packed_name[0] char code is 1.
# armv7 Android/Linux
0:
If I compile and run the model on x86_64 linux then exec_->primitive_map
has 39 items. e.g.
# x86_64 Linux
0: name:fused_add_layout_transform
1: name:fused_nn_contrib_conv2d_NCHWc_add_nn_relu
2: name:fused_nn_max_pool2d
...
38: name:fused_nn_dense_add
ARM64 Android also works fine.
However, exec_->primitive_map has only 29 items (not 39 as x86_64)
# ARM64 Android
0: name:fused_add_11
1: name:fused_nn_conv2d_add_nn_relu
2: name:fused_nn_max_pool2d
...
28: name:fused_nn_dense_add
Not sure why there is no primitive functions compiled for armv7a. You can first check how many cached funcs have been lowered at
exec_->const_device_type.push_back(i); } // update global function map for (auto gv : context_.global_map) { exec_->global_map.insert({gv.first->name_hint, gv.second}); } // update primitive function map size_t primitive_index = 0; for (const auto& cfunc : context_.cached_funcs) { exec_->primitive_map.insert({cfunc->func_name, primitive_index++}); } } transform::Sequential MemoryOpt(tvm::Target host_target, TargetsMap targets) { Array<Pass> pass_seqs; // Manifest the allocations. pass_seqs.push_back(transform::ManifestAlloc(host_target, targets)); // Compute away possibly introduced constant computation.
Debug info at compiler.cc:L973
# Android/Linux armv7
# target: llvm -device=arm_cpu -mtriple=armv7a-linux-androideabi22 -mfloat-abi=soft
# lib.export_library(path_lib, cc="armv7a-linux-androideabi22-clang++", options=["-static-libstdc++"])
update global function map ------------------------------------
main:0
update primitive function map ---------------------------------
fused_add_11:0
fused_nn_conv2d_add_nn_relu:1
fused_nn_max_pool2d:2
fused_add_nn_relu:3
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu:4
fused_nn_contrib_conv2d_winograd_without_weight_transform_add:5
fused_multiply_add_nn_relu:6
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu:7
fused_nn_conv2d_add_nn_relu_1:8
fused_nn_conv2d:9
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_1:10
fused_add_nn_relu_1:11
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1:12
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1:13
fused_nn_conv2d_add_nn_relu_2:14
fused_nn_conv2d_1:15
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_2:16
fused_add_nn_relu_2:17
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_2:18
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_2:19
fused_nn_conv2d_add_nn_relu_3:20
fused_nn_conv2d_2:21
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_3:22
fused_add_nn_relu_3:23
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_3:24
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_multiply_add_nn_re_11882905421691233276_:25
fused_nn_global_avg_pool2d:26
fused_nn_batch_flatten_nn_batch_flatten:27
fused_nn_dense_add:28
Compilation done
# Resulting files size:
model.ro 159,999,677
model.so 645,924
# Android ARM64
# target: llvm -device=arm_cpu -mtriple=arm64-linux-android22
# lib.export_library(path_lib, cc="aarch64-linux-android22-clang++", options=["-static-libstdc++"])
# global function map and primitive function map are the same
# Resulting files size:
model.ro 159,999,677
model.so 506,160
Actually, it has nothing to do with Android.
I got the same issue on Linux ARMv7 , for example Raspberry Pi 3 Model B
# Linux ARMv7 Raspberry Pi 3 Model B
# exec_->primitive_map
0: name:, size: 1, (int) name.at[0]: 1
During the compilation relay/backend/vm/compiler.cc:973 has correct list of primitive functions (29 items)
But, during the execution on ARMv7 device runtime/vm/vm.cc:281 has the list with 1 item and the item is string containing byte 1.
Some Debug info from runtime/vm/executable.cc:482
Model - gluon/mobilenetv2_0.5
Compiled for ARMv7 rasp3b
# ARMv7
# Files:
model.ro - 7,862,354
model.so - 267,088
code string size: 7,862,354
header: 15142753616656602397
version: 0.8.dev0
globals.size(): 1
global[0]: main
constant.size: 219
constant[0]: int64
constant[1]: int64
constant[2]: float32
constant[3]: float32
constant[4]: int64
constant[5]: int64
primitive_names.size: 0
code.num_funcs: 0
The same model compiled for Linux x86_64
# x86_64
# Files:
model.ro - 7,866,095
model.so - 355,424
code string size: 7,866,095
header: 15142753616656602397
version: 0.8.dev0
globals.size(): 1
global[0]: main
constant.size: 225
constant[0]: int64
constant[1]: int64
constant[2]: int64
constant[3]: int64
constant[4]: float32
constant[5]: float32
primitive_names.size: 37
primitive_name[0]fused_layout_transform_42
primitive_name[1]fused_nn_contrib_conv2d_NCHWc_add_clip
primitive_name[2]fused_nn_contrib_conv2d_NCHWc_add_clip_1
primitive_name[3]fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip
primitive_name[4]fused_nn_contrib_conv2d_NCHWc_add
primitive_name[5]fused_nn_contrib_conv2d_NCHWc_add_clip_2
code.num_funcs: 1
func[0].num_instructions: 403
I think I found it. Executable::SaveConstantSection saves device mapping which is std::vector<size_t> const_device_type
.
size_t
is platform dependent type.
8 bytes on 64-bit platforms
4 bytes on 32-bit platforms.
Device mapping vector saving happening right before Primitive names vector saving.
That is probably why reading from the stream on 32-bit edge device is not giving expected result after the ConstantSection. So, Primitive names can not be read.
ok, Reading primitive_names
and reading number of functions
was fixed by using int64_t (Index).
Not loading fails at instr.Load(strm)
runtime/vm/serialize_utils.h#L147
Check failed: loaded_hash == hash (175247657126 vs. 3448965286) : Found mismatch in hash for opcode: 11
ok, hashcode mismatch caused by size_t
type as well!
dmlc.common.h defines HashCombine
which takes and returns size_t
type.
serialize_utils.h defines VectorHash()
which uses HashCombine()
.
VectorHash is used by VMInstructionSerializer.Load()
The following example demonstrates VectorHash(11, {int64_t(0), int64_t(1)})
case and shows that the second HashCombine
gives different results on 64 and 32-bit platforms.
int64_t v0 = 0;
int64_t v1 = 1;
size_t key = 11;
key = HashCombine(key, v0);
std::cout << key <<std::endl;
key = HashCombine(key, v1);
std::cout << key <<std::endl;
Result:
# 64-bit platforms
2654436464
175247657126
# 32-bit platforms
2654436464
3448965286
ok, after switching the hashing functions to uint64_t
key the model works on ARMv7.
uint64_t HashCombine(uint64_t key, const T& value)
uint64_t VectorHash(uint64_t key, const std::vector<T>& values)
Will open PRs