Compile Pytorch model for iOS

L1onKing · May 14, 2021, 6:01pm

Hello everyone!

I have a task of running a Pytorch model in the iOS app and I would like to give TVM a shot. For the time being I’m at the stage of model compilation.

I have successfully compiled it for MacOS using

TVMC

(https://tvm.apache.org/docs/tutorials/get_started/tvmc_command_line_driver.html). But I had a trouble coming up with the target that will compile the model for iOS. I looked through the target list and didn’t find one that suits my needs.

So I decided to go a different way and I have found a python script for Pytorch model converting. I gave it slight modifications and now it looks like this:

model_folder_path = "path to my tvm model"
model_name = "model name"


model = FAN(2)
model_path = "path to pytorch model"
checkpoint = torch.load(model_path,map_location = 'cpu')['state_dict']
model = torch.nn.DataParallel(model)
model.load_state_dict(checkpoint)

model = model.eval()


# We grab the TorchScripted model via tracing
input_shape = [1, 3, 256, 256]
input_data = torch.randn(input_shape)
scripted_model = torch.jit.trace(model, input_data).eval()


# Preprocess the image and convert to tensor


img = prepare_torch_input("path to image") #creating an input to the model

######################################################################
# Import the graph to Relay
# -------------------------
# Convert PyTorch graph to Relay graph. The input name can be arbitrary.
input_name = "input"
shape_list = [(input_name, img.shape)]
mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)



######################################################################
# Relay Build
# -----------
# Compile the graph to llvm target with given input specification.

arch = "arm64"
target = tvm.target.Target("llvm -mtriple=%s-apple-darwin" % arch, host="llvm")

dev = tvm.cpu(0)
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target=target, params=params)

if os.path.exists(model_folder_path) and os.path.isdir(model_folder_path):
    shutil.rmtree(model_folder_path)

os.mkdir(model_folder_path)

libpath = model_folder_path + "/" + model_name + ".so"
lib.export_library(libpath, cc="/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang")

graph = lib.graph_json
graph_json_path = model_folder_path + "/" + model_name + ".json"
with open(graph_json_path, 'w') as fo:
    fo.write(graph)

param_path = model_folder_path + "/" + model_name + ".params"
with open(param_path, 'wb') as fo:
    fo.write(relay.save_param_dict(params))

This script successfully builds the model, and if I’m putting it into my iOS Xcode project, I have an error:

Building for iOS, but the linked item ‘model.so’ was built for macOS.

I understand that I’m doing something wrong with the target and I went through all TVM docs, but I didn’t find any articles about how to form correct targets for models compilation. I’d be happy if someone can nudge me into the right direction on how to solve it.

yuchenj · May 14, 2021, 7:31pm

Hi @L1onKing, I think you can do cross compilation and this iOS RPC might be helpful: tvm/apps/ios_rpc at main · apache/tvm · GitHub.

L1onKing · May 17, 2021, 7:23pm

Hi @yuchenj , thank you for your response, I’m very grateful. Using your guidance, I have managed to figure out that TVM repo actually contains a python code of model compilation. And the confusion for me was that iOS compiled model is .dylib, when in tutorials I saw only .so

So, I have managed to alter my script in a way to produce three files - dylib, json and params. Could you please tell me - am I on the right track?

Also I have a question: I couldn’t find any information related to building TVM runtime library for iOS. Since I have compiled a model, I need to compile TVM runtime library for my target device in order to deploy my model. Is it correct?

Thank you again for your help, I very appreciate it

masahi · May 17, 2021, 9:42pm

@echuraev can probably help on ios related issues.

echuraev · May 18, 2021, 5:03am

Hello @L1onKing!

My colleague is working now on upstreaming some patches for ios support and ios-rpc, so I’ll ask him, and maybe he will be able to extend my answer with some additional details.

First, lets start from your script for building the model. You should do export_library differently:

Import xcode utility: from tvm.contrib import xcode

Use xcode.create_dylib in export_library:

arch = "arm64"
sdk = "iphoneos"
libpath = model_folder_path + "/" + model_name + ".dylib"
lib.export_library(libpath, xcode.create_dylib, arch=arch, sdk=sdk)

It is not necessary to dump your graph_json and params to a separate files.
Add rpath to the library. Run in the terminal: install_name_tool -id @rpath/<model_name>.dylib <model_name>.dylib

After these changes your model should be ready to run it on a device.

Second, we need to compile TVM runtime for iOS:

Use the following cmake flags:

-DUSE_GRAPH_RUNTIME_DEBUG=ON # if you are interested in per layer performance statistic 
-DUSE_LLVM=OFF 
-DCMAKE_SYSTEM_NAME=iOS
-DCMAKE_OSX_ARCHITECTURES=arm64
-DCMAKE_OSX_DEPLOYMENT_TARGET=11.0  # compatibility with old version of iOS 
-DUSE_THREADS=OFF
-DCMAKE_BUILD_WITH_INSTALL_RPATH=TRUE # |
-DCMAKE_INSTALL_NAME_DIR="@rpath"     # | to make it portable without cmake install step

You can change DCMAKE_OSX_DEPLOYMENT_TARGET to your target version.

make -j<num_threads> tvm_runtime
install_name_tool -id @rpath/libtvm_runtime.dylib libtvm_runtime.dylib

You can use these libraries in your iOS project.

elvin-n · May 18, 2021, 5:09am

One small comment in addition to previous Egor’s message - there should be one more option for cmake command line added - -DCMAKE_CXX_FLAGS="-fembed-bitcode-marker"

L1onKing · May 18, 2021, 6:04am

@masahi @echuraev @elvin-n Wow guys! Thank you so much for such an awesome support! I’ll let you know in this thread if I have any issues

L1onKing · May 21, 2021, 8:53am

Hello guys again. Sorry I didn’t update you earlier, I got sick So now I’m back and I have built a iOS dylib model successfully, but I have some problems with runtime. You can find my config.cmake above:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#--------------------------------------------------------------------
#  Template custom cmake configuration for compiling
#
#  This file is used to override the build options in build.
#  If you want to change the configuration, please use the following
#  steps. Assume you are on the root directory. First copy the this
#  file so that any local changes will be ignored by git
#
#  $ mkdir build
#  $ cp cmake/config.cmake build
#
#  Next modify the according entries, and then compile by
#
#  $ cd build
#  $ cmake ..
#
#  Then build in parallel with 8 threads
#
#  $ make -j8
#--------------------------------------------------------------------

#---------------------------------------------
# Backend runtimes.
#---------------------------------------------

set(USE_GRAPH_RUNTIME_DEBUG ON) # if you are interested in per layer performance statistic 
set(USE_LLVM OFF)
set(CMAKE_SYSTEM_NAME iOS)
set(CMAKE_OSX_ARCHITECTURES arm64)
set(CMAKE_OSX_DEPLOYMENT_TARGET 11.0)  # compatibility with old version of iOS 
set(USE_THREADS OFF)
set(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE) # |
set(CMAKE_INSTALL_NAME_DIR "@rpath")     # | to make it portable without cmake install step

# Whether enable CUDA during compile,
#
# Possible values:
# - ON: enable CUDA with cmake's auto search
# - OFF: disable CUDA
# - /path/to/cuda: use specific path to cuda toolkit
set(USE_CUDA OFF)

# Whether enable ROCM runtime
#
# Possible values:
# - ON: enable ROCM with cmake's auto search
# - OFF: disable ROCM
# - /path/to/rocm: use specific path to rocm
set(USE_ROCM OFF)

# Whether enable SDAccel runtime
set(USE_SDACCEL OFF)

# Whether enable Intel FPGA SDK for OpenCL (AOCL) runtime
set(USE_AOCL OFF)

# Whether enable OpenCL runtime
#
# Possible values:
# - ON: enable OpenCL with cmake's auto search
# - OFF: disable OpenCL
# - /path/to/opencl-sdk: use specific path to opencl-sdk
set(USE_OPENCL OFF)

# Whether enable Metal runtime
set(USE_METAL ON)

# Whether enable Vulkan runtime
#
# Possible values:
# - ON: enable Vulkan with cmake's auto search
# - OFF: disable vulkan
# - /path/to/vulkan-sdk: use specific path to vulkan-sdk
set(USE_VULKAN OFF)

# Whether enable OpenGL runtime
set(USE_OPENGL OFF)

# Whether enable MicroTVM runtime
set(USE_MICRO OFF)

# Whether enable RPC runtime
set(USE_RPC ON)

# Whether to build the C++ RPC server binary
set(USE_CPP_RPC OFF)

# Whether embed stackvm into the runtime
set(USE_STACKVM_RUNTIME OFF)

# Whether enable tiny embedded graph executor.
set(USE_GRAPH_EXECUTOR ON)

# Whether enable tiny graph executor with CUDA Graph
set(USE_GRAPH_EXECUTOR_CUDA_GRAPH OFF)

# Whether to enable the profiler for the graph executor and vm
set(USE_PROFILER ON)

# Whether enable uTVM standalone runtime
set(USE_MICRO_STANDALONE_RUNTIME OFF)

# Whether build with LLVM support
# Requires LLVM version >= 4.0
#
# Possible values:
# - ON: enable llvm with cmake's find search
# - OFF: disable llvm, note this will disable CPU codegen
#        which is needed for most cases
# - /path/to/llvm-config: enable specific LLVM when multiple llvm-dev is available.
set(USE_LLVM OFF)

#---------------------------------------------
# Contrib libraries
#---------------------------------------------
# Whether to build with BYODT software emulated posit custom datatype
#
# Possible values:
# - ON: enable BYODT posit, requires setting UNIVERSAL_PATH
# - OFF: disable BYODT posit
#
# set(UNIVERSAL_PATH /path/to/stillwater-universal) for ON
set(USE_BYODT_POSIT OFF)

# Whether use BLAS, choices: openblas, atlas, apple
set(USE_BLAS none)

# Whether to use MKL
# Possible values:
# - ON: Enable MKL
# - /path/to/mkl: mkl root path
# - OFF: Disable MKL
# set(USE_MKL /opt/intel/mkl) for UNIX
# set(USE_MKL ../IntelSWTools/compilers_and_libraries_2018/windows/mkl) for WIN32
# set(USE_MKL <path to venv or site-packages directory>) if using `pip install mkl`
set(USE_MKL OFF)

# Whether use MKLDNN library, choices: ON, OFF, path to mkldnn library
set(USE_MKLDNN OFF)

# Whether use OpenMP thread pool, choices: gnu, intel
# Note: "gnu" uses gomp library, "intel" uses iomp5 library
set(USE_OPENMP none)

# Whether use contrib.random in runtime
set(USE_RANDOM ON)

# Whether use NNPack
set(USE_NNPACK OFF)

# Possible values:
# - ON: enable tflite with cmake's find search
# - OFF: disable tflite
# - /path/to/libtensorflow-lite.a: use specific path to tensorflow lite library
set(USE_TFLITE OFF)

# /path/to/tensorflow: tensorflow root path when use tflite library
set(USE_TENSORFLOW_PATH none)

# Required for full builds with TFLite. Not needed for runtime with TFLite.
# /path/to/flatbuffers: flatbuffers root path when using tflite library
set(USE_FLATBUFFERS_PATH none)

# Possible values:
# - OFF: disable tflite support for edgetpu
# - /path/to/edgetpu: use specific path to edgetpu library
set(USE_EDGETPU OFF)

# Possible values:
# - ON: enable cuDNN with cmake's auto search in CUDA directory
# - OFF: disable cuDNN
# - /path/to/cudnn: use specific path to cuDNN path
set(USE_CUDNN OFF)

# Whether use cuBLAS
set(USE_CUBLAS OFF)

# Whether use MIOpen
set(USE_MIOPEN OFF)

# Whether use MPS
set(USE_MPS OFF)

# Whether use rocBlas
set(USE_ROCBLAS OFF)

# Whether use contrib sort
set(USE_SORT ON)

# Whether use MKL-DNN (DNNL) codegen
set(USE_DNNL_CODEGEN OFF)

# Whether to use Arm Compute Library (ACL) codegen
# We provide 2 separate flags since we cannot build the ACL runtime on x86.
# This is useful for cases where you want to cross-compile a relay graph
# on x86 then run on AArch.
#
# An example of how to use this can be found here: docs/deploy/arm_compute_lib.rst.
#
# USE_ARM_COMPUTE_LIB - Support for compiling a relay graph offloading supported
#                       operators to Arm Compute Library. OFF/ON
# USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR - Run Arm Compute Library annotated functions via the ACL
#                                     runtime. OFF/ON/"path/to/ACL"
set(USE_ARM_COMPUTE_LIB OFF)
set(USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR OFF)

# Whether to build with Arm Ethos-N support
# Possible values:
# - OFF: disable Arm Ethos-N support
# - path/to/arm-ethos-N-stack: use a specific version of the
#   Ethos-N driver stack
set(USE_ETHOSN OFF)
# If USE_ETHOSN is enabled, use ETHOSN_HW (ON) if Ethos-N hardware is available on this machine
# otherwise use ETHOSN_HW (OFF) to use the software test infrastructure
set(USE_ETHOSN_HW OFF)

# Whether to build with TensorRT codegen or runtime
# Examples are available here: docs/deploy/tensorrt.rst.
#
# USE_TENSORRT_CODEGEN - Support for compiling a relay graph where supported operators are
#                        offloaded to TensorRT. OFF/ON
# USE_TENSORRT_RUNTIME - Support for running TensorRT compiled modules, requires presense of
#                        TensorRT library. OFF/ON/"path/to/TensorRT"
set(USE_TENSORRT_CODEGEN OFF)
set(USE_TENSORRT_RUNTIME OFF)

# Whether use VITIS-AI codegen
set(USE_VITIS_AI OFF)

# Build Verilator codegen and runtime
set(USE_VERILATOR OFF)

# Build ANTLR parser for Relay text format
# Possible values:
# - ON: enable ANTLR by searching default locations (cmake find_program for antlr4 and /usr/local for jar)
# - OFF: disable ANTLR
# - /path/to/antlr-*-complete.jar: path to specific ANTLR jar file
set(USE_ANTLR OFF)

# Whether use Relay debug mode
set(USE_RELAY_DEBUG OFF)

# Whether to build fast VTA simulator driver
set(USE_VTA_FSIM OFF)

# Whether to build cycle-accurate VTA simulator driver
set(USE_VTA_TSIM OFF)

# Whether to build VTA FPGA driver (device side only)
set(USE_VTA_FPGA OFF)

# Whether use Thrust
set(USE_THRUST OFF)

# Whether to build the TensorFlow TVMDSOOp module
set(USE_TF_TVMDSOOP OFF)

# Whether to use STL's std::unordered_map or TVM's POD compatible Map
set(USE_FALLBACK_STL_MAP OFF)

# Whether to use hexagon device
set(USE_HEXAGON_DEVICE OFF)
set(USE_HEXAGON_SDK /path/to/sdk)

# Whether to use ONNX codegen
set(USE_TARGET_ONNX OFF)

# Whether enable BNNS runtime
set(USE_BNNS OFF)

# Whether to use libbacktrace
# Libbacktrace provides line and column information on stack traces from errors.
# It is only supported on linux and macOS.
# Possible values:
# - AUTO: auto set according to system information and feasibility
# - ON: enable libbacktrace
# - OFF: disable libbacktrace
set(USE_LIBBACKTRACE AUTO)

The problem is when I built runtime library with it and integrate it into my iOS project, I have an error:

Building for iOS, but the linked library 'libtvm_runtime.dylib' was built for macOS.

Could you please advise me how to fix that issue? Thank you!

echuraev · May 21, 2021, 9:11am

Hello! Hope you got better and now everything is fine. Did you try to build tvm_runtime from the terminal without this config.cmake file? There are some other options (e.g. USE_RPC, USE_PROFILER) enabled in your config.cmake file. I’m not sure if these options may affect on the build process or not. Could you please, try to build runtime from the terminal? Example of commands:

$ cd <tvm_dir>
$ mkdir ios-build && cd ios-build
$ cmake -DUSE_GRAPH_RUNTIME_DEBUG=ON -DUSE_LLVM=OFF -DCMAKE_SYSTEM_NAME=iOS -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_OSX_DEPLOYMENT_TARGET=14.0 -DUSE_THREADS=OFF -DCMAKE_BUILD_WITH_INSTALL_RPATH=TRUE -DCMAKE_INSTALL_NAME_DIR="@rpath" -DUSE_METAL=ON -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="-fembed-bitcode-marker" ..
$ make -j8 tvm_runtime && dsymutil libtvm_runtime.dylib

This works for me. Please, let me know if it works for you, or not.

L1onKing · May 21, 2021, 9:52am

Hello @echuraev ! Thank you for your kind words. Your advise has worked, and I have managed to build tvm_runtime.dylib for iOS using next command:

cmake -DUSE_GRAPH_RUNTIME_DEBUG=ON -DUSE_LLVM=OFF -DCMAKE_SYSTEM_NAME=iOS -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_OSX_DEPLOYMENT_TARGET=11.0 -DUSE_THREADS=OFF -DCMAKE_BUILD_WITH_INSTALL_RPATH=TRUE -DCMAKE_INSTALL_NAME_DIR="@rpath" -DUSE_METAL=ON -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="-fembed-bitcode-marker"

Also I’ll put new observations under this thread if there are any. I hope it will help someone

L1onKing · May 21, 2021, 2:03pm

Since I have model library and runtime library in my hands now, I’m working on writing an inference code. Unfortunately, I’m having problems from the very beginning:

m_mod = tvm::runtime::Module::LoadFromFile(“/Applications/Algoface-Landmarks-Tracker.app/model.dylib”);

The problem here is that I can not identify a path to .dylib file because I can’t find it in my Application package. Could you please give me an advice on how can I do that?

Also, if I’m not mistaken, App Store forbids usage of .dylib files inside the iOS application. What modifications should I do in order to have TVM in my application not worrying that App Store forbids its upload?

elvin-n · May 23, 2021, 7:33pm

You need to run install_name_tool for your model library like

install_name_tool -id @rpath/model.dylib model.dylib

and then just point the name of the library without additional path in iOS app like

m_mod = tvm::runtime::Module::LoadFromFile("model.dylib");

L1onKing · May 24, 2021, 10:27am

Hello @elvin-n ! Thank you very much for a suggestion. My problem was different, but I didn’t know that install_name_tool has such an impact. Noted

echuraev · May 25, 2021, 6:31am

Hello, @L1onKing! Did you solve your problem? I think, you could use pathForResource: Apple Developer Documentation

L1onKing · May 25, 2021, 8:53am

Hello @echuraev ! I have solved an issue, thank you very much! Now I’m working on tuning the iOS model and I’m having next error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'default_module_loader.<locals>.default_module_loader_mgr'

And here’s my script for tuning the model:

import os

import numpy as np

import tvm
from tvm import relay, autotvm
import tvm.relay.testing

from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
from tvm.contrib.utils import tempdir
import tvm.contrib.graph_executor as runtime
from tvm.contrib import xcode

from fan_model import FAN
from utils_inference_2 import *

import shutil

def get_model():
    model_folder_path = "path-to-model
    model_name = "model_name"

    # detector = MTCNN()

    model = FAN(2)
    model_path = "pytorch-model-path"
    checkpoint = torch.load(model_path, map_location='cpu')['state_dict']
    model = torch.nn.DataParallel(model)
    model.load_state_dict(checkpoint)

    model = model.eval()

    input_shape = [1, 3, 256, 256]
    input_data = torch.randn(input_shape)
    scripted_model = torch.jit.trace(model, input_data).eval()

    input_name = "input"
    shape_list = [(input_name, input_shape)]
    mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)

    return mod, params, (1, 3, 256, 256)

def fcompile(*args):
    xcode.create_dylib(*args, arch=arch, sdk=sdk)
    path = args[0]
    xcode.codesign(path)
    xcode.popen_test_rpc(proxy_host, proxy_port, device_key, destination=destination, libs=[path])

fcompile.output_format = "dylib"

proxy_host = "192.168.1.4"
proxy_port = 9090
device_key = "iphone"
destination = "platform=iOS,id=phone-id"

# Change target configuration, this is setting for iphone6s
# arch = "x86_64"
# sdk = "iphonesimulator"
arch = "arm64"
sdk = "iphoneos"
target = "metal"
target_host = "llvm -mtriple=%s-apple-darwin" % arch

model_name = "model_name"
lib_path = "path-to-tvm-folder"

log_file = "model.log"
tuning_option = {
    "log_filename": log_file,
    "tuner": "xgb",
    "n_trial": 1000,
    "early_stopping": 450,
    'measure_option': autotvm.measure_option(
        builder=autotvm.LocalBuilder(
            n_parallel=1,
            build_func=fcompile,
            timeout=60
        ),
        runner=autotvm.RPCRunner(
            device_key,
            host='127.0.0.1', # I'm not sure. Here might be an actual IP address of proxy/host machine
            port=9190,
            number=20, repeat=3, timeout=60, min_repeat_ms=150)
    ),
}

# You can skip the implementation of this function for this tutorial.
def tune_tasks(
    tasks,
    measure_option,
    tuner="xgb",
    n_trial=1000,
    early_stopping=None,
    log_filename="tuning.log",
    use_transfer_learning=True,
):
    # create tmp log file
    tmp_log_file = log_filename + ".tmp"
    if os.path.exists(tmp_log_file):
        os.remove(tmp_log_file)

    for i, tsk in enumerate(reversed(tasks)):
        prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))

        # create tuner
        if tuner == "xgb" or tuner == "xgb-rank":
            tuner_obj = XGBTuner(tsk, loss_type="rank")
        elif tuner == "ga":
            tuner_obj = GATuner(tsk, pop_size=50)
        elif tuner == "random":
            tuner_obj = RandomTuner(tsk)
        elif tuner == "gridsearch":
            tuner_obj = GridSearchTuner(tsk)
        else:
            raise ValueError("Invalid tuner: " + tuner)

        if use_transfer_learning:
            if os.path.isfile(tmp_log_file):
                tuner_obj.load_history(autotvm.record.load_from_file(tmp_log_file))

        # do tuning
        tsk_trial = min(n_trial, len(tsk.config_space))
        tuner_obj.tune(
            n_trial=tsk_trial,
            early_stopping=early_stopping,
            measure_option=measure_option,
            callbacks=[
                autotvm.callback.progress_bar(tsk_trial, prefix=prefix),
                autotvm.callback.log_to_file(tmp_log_file),
            ],
        )

    # pick best records to a cache file
    autotvm.record.pick_best(tmp_log_file, log_filename)
    os.remove(tmp_log_file)

def tune_and_evaluate(tuning_opt):
    # extract workloads from relay program
    print("Extract tasks...")
    mod, params, input_shape = get_model()
    tasks = autotvm.task.extract_from_program(
        mod["main"],
        target=target,
        params=params,
        ops=(relay.op.get("nn.conv2d"),),
    )

    # run tuning tasks
    print("Tuning...")
    tune_tasks(tasks, **tuning_opt)

    # compile kernels with history best records
    with autotvm.apply_history_best(log_file):
        print("Compile...")
        with tvm.transform.PassContext(opt_level=3):
            lib = relay.build_module.build(mod, target=target, params=params)
        # export library
        tmp = tempdir()

        if os.path.exists(lib_path) and os.path.isdir(lib_path):
            shutil.rmtree(lib_path)

        os.mkdir(lib_path)

        path_dylib = lib_path + "/" + model_name + ".dylib"
        lib.export_library(path_dylib, xcode.create_dylib, arch=arch, sdk=sdk)

        # upload module to device
        print("Upload...")
        remote = autotvm.measure.request_remote(device_key, "127.0.0.1", 9190, timeout=10000)
        remote.upload(tmp.relpath(path_dylib))
        rlib = remote.load_module(path_dylib)

        # FINISH IT LATER!
        # # upload parameters to device
        # dev = remote.device(str(target), 0)
        # module = runtime.GraphModule(rlib["default"](dev))
        # data_tvm = tvm.nd.array((np.random.uniform(size=input_shape)).astype(dtype))
        # module.set_input("data", data_tvm)
        #
        # # evaluate
        # print("Evaluate inference time cost...")
        # ftimer = module.module.time_evaluator("run", dev, number=1, repeat=30)
        # prof_res = np.array(ftimer().results) * 1000  # convert to millisecond
        # print(
        #     "Mean inference time (std dev): %.2f ms (%.2f ms)"
        #     % (np.mean(prof_res), np.std(prof_res))
        # )

if __name__ == "__main__":
    tune_and_evaluate(tuning_option)

Please, let me know if I need to create a separate thread in order to discuss my new issue. Thank you very much!

apeskov · May 26, 2021, 12:22am

Hi @L1onKing!

As I see you already dealt with that problem. The issue described by you in Tune the model in iOS is a next level of progress with iOS tuning.

But anyway the issue with transferring python objects between process on MacOS is still present in tvm. That’s a wee known issue with python on MacOS. I’ve just check it with latest TVM main branch and it really cannot pass default_module_loader_mgr object into subprocess.

Could you please share your patch with fix/workaround?

L1onKing · May 26, 2021, 6:35am

Hi @apeskov!

Yes, sure! I’ll clean up my code so I can extract the patch out of it, but the change was rather simple:

@contextlib.contextmanager
def default_module_loader_mgr(remote_kwargs, build_result):
    remote = request_remote(**remote_kwargs)
    # if pre_load_function is not None:
    #     pre_load_function(remote, build_result)

    print("Remote upload = ", build_result.filename)
    remote.upload(build_result.filename)
    try:
        yield remote, remote.load_module(os.path.split(build_result.filename)[1])

    finally:
        # clean up remote files
        remote.remove(build_result.filename)
        remote.remove(os.path.splitext(build_result.filename)[0] + ".so")
        remote.remove("")
        print("Clean up = ", build_result.filename)

def default_module_loader(pre_load_function=None):
    """Returns a default function that can be passed as module_loader to run_through_rpc.

    Parameters
    ----------
    pre_load_function : Optional[Function[tvm.rpc.Session, tvm.runtime.Module]]
        Invoked after a session is established and before the default code-loading RPC calls are
        issued. Allows performing pre-upload actions, e.g. resetting the remote runtime environment.

    Returns
    -------
    ModuleLoader :
        A function that can be passed as module_loader to run_through_rpc.
    """

    return default_module_loader_mgr

I’m not good with Python (yet), but what I understood is that Python has problems pickling nested functions since it gets complicated to maintain a pointer to those. So I have just removed default_module_loader_mgr outside of default_module_loader method score and that did the trick