RelayVM on ARMv7 fails - Cannot find function in module

apivovarov · March 5, 2021, 9:53pm

I’m trying to run gluon resnet18 model on Android/Linux armv7 using relayvm runtime.

After the compilation I got two files - model.ro and model.so

Model runner inits the model with the following code: (the last statement fails)

  std::string model_lib_path = "./model.so";
  std::string model_ro_path = "./model.ro";

  std::string code_data = LoadFileToString(model_ro_path, std::ios::in | std::ios::binary);

  tvm::runtime::Module model_lib = tvm::runtime::Module::LoadFromFile(model_lib_path);

  tvm::runtime::Module vm_executable_ = tvm::runtime::vm::Executable::Load(code_data, model_lib);
  auto vm = tvm::runtime::make_object<tvm::runtime::vm::VirtualMachine>();

  vm->LoadExecutable(static_cast<tvm::runtime::vm::Executable*>(
      const_cast<tvm::runtime::Object*>(vm_executable_.get())));

Error:

terminating with uncaught exception of type dmlc::Error: [17:24:02] /home/pivovaa/workplace/tvm/src/runtime/vm/vm.cc:281: Check failed: pf != nullptr: Cannot find function in module: 
/buildbot/src/android/ndk-release-r21/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:72: abort_message: assertion "terminating with uncaught exception of type dmlc::Error: [17:24:02] /home/pivovaa/workplace/tvm/src/runtime/vm/vm.cc:281: Check failed: pf != nullptr: Cannot find function in module: " failed
Aborted

Error Line: src/runtime/vm/vm.cc:281

apivovarov · March 5, 2021, 9:53pm

vm.cc:281 for model compiled for Android/Linux armv7a shows the following debug info:

exec_->primitive_map size is 1.

packed_name string size is 1.

packed_name[0] char code is 1.

# armv7 Android/Linux
0:

If I compile and run the model on x86_64 linux then exec_->primitive_map has 39 items. e.g.

# x86_64 Linux
0: name:fused_add_layout_transform
1: name:fused_nn_contrib_conv2d_NCHWc_add_nn_relu
2: name:fused_nn_max_pool2d
...
38: name:fused_nn_dense_add

ARM64 Android also works fine. However, exec_->primitive_map has only 29 items (not 39 as x86_64)

# ARM64 Android
0: name:fused_add_11
1: name:fused_nn_conv2d_add_nn_relu
2: name:fused_nn_max_pool2d
...
28: name:fused_nn_dense_add

apivovarov · March 5, 2021, 7:31pm

@jroesch @zhiics @rkimball @haichen What you think?

haichen · March 5, 2021, 8:00pm

Not sure why there is no primitive functions compiled for armv7a. You can first check how many cached funcs have been lowered at

github.com

apache/tvm/blob/v0.7/src/relay/backend/vm/compiler.cc#L973


    exec_->const_device_type.push_back(i);
  }
  // update global function map
  for (auto gv : context_.global_map) {
    exec_->global_map.insert({gv.first->name_hint, gv.second});
  }
  // update primitive function map
  size_t primitive_index = 0;
  for (const auto& cfunc : context_.cached_funcs) {
    exec_->primitive_map.insert({cfunc->func_name, primitive_index++});
  }
}
transform::Sequential MemoryOpt(tvm::Target host_target, TargetsMap targets) {
  Array<Pass> pass_seqs;
  // Manifest the allocations.
  pass_seqs.push_back(transform::ManifestAlloc(host_target, targets));
  // Compute away possibly introduced constant computation.

apivovarov · March 5, 2021, 9:54pm

Debug info at compiler.cc:L973

# Android/Linux armv7
# target: llvm -device=arm_cpu -mtriple=armv7a-linux-androideabi22 -mfloat-abi=soft
# lib.export_library(path_lib, cc="armv7a-linux-androideabi22-clang++", options=["-static-libstdc++"])
update global function map ------------------------------------
main:0
update primitive function map ---------------------------------
fused_add_11:0
fused_nn_conv2d_add_nn_relu:1
fused_nn_max_pool2d:2
fused_add_nn_relu:3
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu:4
fused_nn_contrib_conv2d_winograd_without_weight_transform_add:5
fused_multiply_add_nn_relu:6
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu:7
fused_nn_conv2d_add_nn_relu_1:8
fused_nn_conv2d:9
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_1:10
fused_add_nn_relu_1:11
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1:12
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1:13
fused_nn_conv2d_add_nn_relu_2:14
fused_nn_conv2d_1:15
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_2:16
fused_add_nn_relu_2:17
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_2:18
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_2:19
fused_nn_conv2d_add_nn_relu_3:20
fused_nn_conv2d_2:21
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_3:22
fused_add_nn_relu_3:23
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_3:24
fused_nn_contrib_conv2d_winograd_without_weight_transform_add_multiply_add_nn_re_11882905421691233276_:25
fused_nn_global_avg_pool2d:26
fused_nn_batch_flatten_nn_batch_flatten:27
fused_nn_dense_add:28
Compilation done

# Resulting files size:
model.ro 159,999,677
model.so     645,924

# Android ARM64
# target: llvm -device=arm_cpu -mtriple=arm64-linux-android22
# lib.export_library(path_lib, cc="aarch64-linux-android22-clang++", options=["-static-libstdc++"])

# global function map and primitive function map are the same

# Resulting files size:
model.ro 159,999,677
model.so     506,160

apivovarov · March 5, 2021, 9:03pm

Actually, it has nothing to do with Android. I got the same issue on Linux ARMv7 , for example Raspberry Pi 3 Model B

# Linux ARMv7 Raspberry Pi 3 Model B
# exec_->primitive_map
0: name:, size: 1, (int) name.at[0]: 1

During the compilation relay/backend/vm/compiler.cc:973 has correct list of primitive functions (29 items)

But, during the execution on ARMv7 device runtime/vm/vm.cc:281 has the list with 1 item and the item is string containing byte 1.

apivovarov · March 6, 2021, 1:41am

Some Debug info from runtime/vm/executable.cc:482

Model - gluon/mobilenetv2_0.5

Compiled for ARMv7 rasp3b

# ARMv7
# Files:
model.ro - 7,862,354
model.so -   267,088

code string size: 7,862,354

header: 15142753616656602397
version: 0.8.dev0

globals.size(): 1
global[0]: main

constant.size: 219
constant[0]: int64
constant[1]: int64
constant[2]: float32
constant[3]: float32
constant[4]: int64
constant[5]: int64

primitive_names.size: 0

code.num_funcs: 0

The same model compiled for Linux x86_64

# x86_64
# Files:
model.ro - 7,866,095
model.so -   355,424

code string size: 7,866,095

header: 15142753616656602397
version: 0.8.dev0

globals.size(): 1
global[0]: main

constant.size: 225
constant[0]: int64
constant[1]: int64
constant[2]: int64
constant[3]: int64
constant[4]: float32
constant[5]: float32

primitive_names.size: 37
primitive_name[0]fused_layout_transform_42
primitive_name[1]fused_nn_contrib_conv2d_NCHWc_add_clip
primitive_name[2]fused_nn_contrib_conv2d_NCHWc_add_clip_1
primitive_name[3]fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip
primitive_name[4]fused_nn_contrib_conv2d_NCHWc_add
primitive_name[5]fused_nn_contrib_conv2d_NCHWc_add_clip_2

code.num_funcs: 1
func[0].num_instructions: 403

apivovarov · March 6, 2021, 3:05am

I think I found it. Executable::SaveConstantSection saves device mapping which is std::vector<size_t> const_device_type.

size_t is platform dependent type.

8 bytes on 64-bit platforms
4 bytes on 32-bit platforms.

Device mapping vector saving happening right before Primitive names vector saving.

That is probably why reading from the stream on 32-bit edge device is not giving expected result after the ConstantSection. So, Primitive names can not be read.

apivovarov · March 6, 2021, 3:40am

ok, Reading primitive_names and reading number of functions was fixed by using int64_t (Index).

Not loading fails at instr.Load(strm)

runtime/vm/serialize_utils.h#L147 Check failed: loaded_hash == hash (175247657126 vs. 3448965286) : Found mismatch in hash for opcode: 11

apivovarov · March 6, 2021, 5:18am

ok, hashcode mismatch caused by size_t type as well!

dmlc.common.h defines HashCombine which takes and returns size_t type.

serialize_utils.h defines VectorHash() which uses HashCombine().

VectorHash is used by VMInstructionSerializer.Load()

The following example demonstrates VectorHash(11, {int64_t(0), int64_t(1)}) case and shows that the second HashCombine gives different results on 64 and 32-bit platforms.

  int64_t v0 = 0;
  int64_t v1 = 1;
  size_t key = 11;
  key = HashCombine(key, v0);
  std::cout << key <<std::endl;
  key = HashCombine(key, v1);
  std::cout << key <<std::endl;

Result:

# 64-bit platforms
2654436464
175247657126

# 32-bit platforms
2654436464
3448965286

apivovarov · March 6, 2021, 5:40am

ok, after switching the hashing functions to uint64_t key the model works on ARMv7.

uint64_t HashCombine(uint64_t key, const T& value)
uint64_t VectorHash(uint64_t key, const std::vector<T>& values)

Will open PRs

apivovarov · March 6, 2021, 6:37am