RelayVM on ARMv7 fails - Cannot find function in module

I’m trying to run gluon resnet18 model on Android/Linux armv7 using relayvm runtime.

After the compilation I got two files - model.ro and model.so

Model runner inits the model with the following code: (the last statement fails)

  std::string model_lib_path = "./model.so";
  std::string model_ro_path = "./model.ro";

  std::string code_data = LoadFileToString(model_ro_path, std::ios::in | std::ios::binary);

  tvm::runtime::Module model_lib = tvm::runtime::Module::LoadFromFile(model_lib_path);

  tvm::runtime::Module vm_executable_ = tvm::runtime::vm::Executable::Load(code_data, model_lib);
  auto vm = tvm::runtime::make_object<tvm::runtime::vm::VirtualMachine>();

  vm->LoadExecutable(static_cast<tvm::runtime::vm::Executable*>(
      const_cast<tvm::runtime::Object*>(vm_executable_.get())));

Error:

terminating with uncaught exception of type dmlc::Error: [17:24:02] /home/pivovaa/workplace/tvm/src/runtime/vm/vm.cc:281: Check failed: pf != nullptr: Cannot find function in module: 
/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "terminating with uncaught exception of type dmlc::Error: [17:24:02] /home/pivovaa/workplace/tvm/src/runtime/vm/vm.cc:281: Check failed: pf != nullptr: Cannot find function in module: " failed
Aborted

Error Line: src/runtime/vm/vm.cc:281

vm.cc:281 for model compiled for Android/Linux armv7a shows the following debug info:

exec_->primitive_map size is 1.

packed_name string size is 1.

packed_name[0] char code is 1.

# armv7 Android/Linux
0: 

If I compile and run the model on x86_64 linux then exec_->primitive_map has 39 items. e.g.

# x86_64 Linux
0: name:fused_add_layout_transform
1: name:fused_nn_contrib_conv2d_NCHWc_add_nn_relu
2: name:fused_nn_max_pool2d
...
38: name:fused_nn_dense_add

ARM64 Android also works fine. However, exec_->primitive_map has only 29 items (not 39 as x86_64)

# ARM64 Android
0: name:fused_add_11
1: name:fused_nn_conv2d_add_nn_relu
2: name:fused_nn_max_pool2d
...
28: name:fused_nn_dense_add

@jroesch @zhiics @rkimball @haichen What you think?

Not sure why there is no primitive functions compiled for armv7a. You can first check how many cached funcs have been lowered at

Debug info at compiler.cc:L973

# Android/Linux armv7
# target: llvm -device=arm_cpu -mtriple=armv7a-linux-androideabi22 -mfloat-abi=soft
# lib.export_library(path_lib, cc="armv7a-linux-androideabi22-clang++", options=["-static-libstdc++"])
update global function map ------------------------------------
main:0
update primitive function map ---------------------------------
fused_add_11:0
fused_nn_conv2d_add_nn_relu:1
fused_nn_max_pool2d:2
fused_add_nn_relu:3
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu:4
fused_nn_contrib_conv2d_winograd_without_weight_transform_add:5
fused_multiply_add_nn_relu:6
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu:7
fused_nn_conv2d_add_nn_relu_1:8
fused_nn_conv2d:9
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_1:10
fused_add_nn_relu_1:11
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1:12
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1:13
fused_nn_conv2d_add_nn_relu_2:14
fused_nn_conv2d_1:15
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_2:16
fused_add_nn_relu_2:17
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_2:18
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_2:19
fused_nn_conv2d_add_nn_relu_3:20
fused_nn_conv2d_2:21
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_3:22
fused_add_nn_relu_3:23
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_3:24
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_multiply_add_nn_re_11882905421691233276_:25
fused_nn_global_avg_pool2d:26
fused_nn_batch_flatten_nn_batch_flatten:27
fused_nn_dense_add:28
Compilation done

# Resulting files size:
model.ro 159,999,677
model.so     645,924
# Android ARM64
# target: llvm -device=arm_cpu -mtriple=arm64-linux-android22
# lib.export_library(path_lib, cc="aarch64-linux-android22-clang++", options=["-static-libstdc++"])

# global function map and primitive function map are the same

# Resulting files size:
model.ro 159,999,677
model.so     506,160

Actually, it has nothing to do with Android. I got the same issue on Linux ARMv7 , for example Raspberry Pi 3 Model B

# Linux ARMv7 Raspberry Pi 3 Model B
# exec_->primitive_map
0: name:, size: 1, (int) name.at[0]: 1

During the compilation relay/backend/vm/compiler.cc:973 has correct list of primitive functions (29 items)

But, during the execution on ARMv7 device runtime/vm/vm.cc:281 has the list with 1 item and the item is string containing byte 1.

Some Debug info from runtime/vm/executable.cc:482

Model - gluon/mobilenetv2_0.5

Compiled for ARMv7 rasp3b

# ARMv7
# Files:
model.ro - 7,862,354
model.so -   267,088

code string size: 7,862,354

header: 15142753616656602397
version: 0.8.dev0

globals.size(): 1
global[0]: main

constant.size: 219
constant[0]: int64
constant[1]: int64
constant[2]: float32
constant[3]: float32
constant[4]: int64
constant[5]: int64

primitive_names.size: 0

code.num_funcs: 0

The same model compiled for Linux x86_64

# x86_64
# Files:
model.ro - 7,866,095
model.so -   355,424

code string size: 7,866,095

header: 15142753616656602397
version: 0.8.dev0

globals.size(): 1
global[0]: main

constant.size: 225
constant[0]: int64
constant[1]: int64
constant[2]: int64
constant[3]: int64
constant[4]: float32
constant[5]: float32

primitive_names.size: 37
primitive_name[0]fused_layout_transform_42
primitive_name[1]fused_nn_contrib_conv2d_NCHWc_add_clip
primitive_name[2]fused_nn_contrib_conv2d_NCHWc_add_clip_1
primitive_name[3]fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip
primitive_name[4]fused_nn_contrib_conv2d_NCHWc_add
primitive_name[5]fused_nn_contrib_conv2d_NCHWc_add_clip_2

code.num_funcs: 1
func[0].num_instructions: 403

I think I found it. Executable::SaveConstantSection saves device mapping which is std::vector<size_t> const_device_type.

size_t is platform dependent type.

  • 8 bytes on 64-bit platforms
  • 4 bytes on 32-bit platforms.

Device mapping vector saving happening right before Primitive names vector saving.

That is probably why reading from the stream on 32-bit edge device is not giving expected result after the ConstantSection. So, Primitive names can not be read.

ok, Reading primitive_names and reading number of functions was fixed by using int64_t (Index).

Not loading fails at instr.Load(strm)

runtime/vm/serialize_utils.h#L147 Check failed: loaded_hash == hash (175247657126 vs. 3448965286) : Found mismatch in hash for opcode: 11

ok, hashcode mismatch caused by size_t type as well!

dmlc.common.h defines HashCombine which takes and returns size_t type.

serialize_utils.h defines VectorHash() which uses HashCombine().

VectorHash is used by VMInstructionSerializer.Load()

The following example demonstrates VectorHash(11, {int64_t(0), int64_t(1)}) case and shows that the second HashCombine gives different results on 64 and 32-bit platforms.

  int64_t v0 = 0;
  int64_t v1 = 1;
  size_t key = 11;
  key = HashCombine(key, v0);
  std::cout << key <<std::endl;
  key = HashCombine(key, v1);
  std::cout << key <<std::endl;

Result:

# 64-bit platforms
2654436464
175247657126

# 32-bit platforms
2654436464
3448965286

ok, after switching the hashing functions to uint64_t key the model works on ARMv7.

  • uint64_t HashCombine(uint64_t key, const T& value)
  • uint64_t VectorHash(uint64_t key, const std::vector<T>& values)

Will open PRs