[AutoTVM] Could not train for the remote device

With the latest master code and PR: https://github.com/dmlc/tvm/pull/1487. I follow the tutorial: https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html trying to tune it for remote device. Everything goes smoothly before tuning.

However, when I run tune_and_evaluate() to do the tuning. I find that
[Task 1/19] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (201/1000) | 499.30 s Done

I have tested RPC connect / status, which is OK.

We can get debug information by printing all the tuning results.

Put these lines in the beginning of your script

import logging
logging.basicConfig(level=logging.DEBUG)

remove the progress_bar callback function in this block

        # do tuning
        tuner_obj.tune(n_trial=min(n_trial, len(tsk.config_space)),
                       early_stopping=early_stopping,
                       measure_option=measure_option,
                       callbacks=[
 ## REMOVE THIS LINE ##    autotvm.callback.progress_bar(n_trial, prefix=prefix), 
                           autotvm.callback.log_to_file(tmp_log_file)])

Let me see your error?

result: MeasureResult(costs=(TVMError('Except caught from RPC call: TVMCall CFunc Error:Traceback (most recent call last):\n
File “/data/wuzhao/tvm/tvm/_ffi/_ctypes/function.py”, line 54, in cfun rv = local_pyfunc(*pyargs) File “/data/wuzhao/tvm/tvm/rpc/server.py”, line 48, in load_module\n m = _load_module(path)\n File “/data/wuzhao/tvm/tvm/module.py”, line 222, in load\n return _LoadFromFile(path, fmt)\n
File “/data/wuzhao/tvm/tvm/_ffi/function.py”, line 280, in my_api_func\n return flocal(*args)\n File “/data/wuzhao/tvm/tvm/_ffi/_ctypes/function.py”, line 184, in call\n ctypes.byref(ret_val), ctypes.byref(ret_tcode)))\n File “/data/wuzhao/tvm/tvm/_ffi/base.py”, line 66, in check_call\n

raise TVMError(py_str(_LIB.TVMGetLastError())) tvm.ffi.base.TVMError: [11:05:47] /home/wuzhao/third-party/github-tvm/tvm/src/runtime/dso_module.cc:93: Check failed: lib_handle != nullptr Failed to load dynamic shared library /tmp/tmpu6lj43y3/tmp_func_9c228e3f2ff51f70.so /tmp/tmpu6lj43y3/tmp_func_9c228e3f2ff51f70.so: failed to map segment from shared object\n\nStack trace returned 9 entries

I guess the reason is here. I previously met one similar problem, then I change the upload function from remote.upload(tmp.relpath(filename)) to ```remote.upload(tmp.relpath(filename), target=‘ONE_PATH’)

And durting the tunning, could I set the tunning’s upload directory like remote.upload(target)?

@merrymercy I solved this problem leveraging your DEBUG utility.

I let the measure_methods.py accepting one environment so that I can set the remote module dir. Do you mind we add this environment ? Here is my diff:

--- a/python/tvm/autotvm/measure/measure_methods.py
+++ b/python/tvm/autotvm/measure/measure_methods.py
@@ -317,8 +317,11 @@ def _measure_common(input_pack, build_func, build_kwargs, number, repeat,

         # upload built module
         if remote:
-            remote.upload(tmp_dir.relpath(filename))
-            func = remote.load_module(filename)
+            remote_upload_target_dir = os.environ.get('TVM_RPC_REMOTE_DIR')
+            remote_target = os.path.join(remote_upload_target_dir, filename) if remote_upload_target_dir else None
+            remote_module = os.path.join(remote_upload_target_dir, filename) if remote_upload_target_dir else filename
+            remote.upload(tmp_dir.relpath(filename), target=remote_target)
+            func = remote.load_module(remote_module)
             ctx = remote.context(str(inp.target), 0)
             time_f = func.time_evaluator(
                 func.entry_name, ctx, number=number, repeat=repeat)

Nice!
You can send the patch.
Better with

if 'TVM_RPC_REMOTE_RID' in os.environ:
    # your code here
    remote_upload_target_dir = os.environ.get('TVM_RPC_REMOTE_DIR')
    remote_target = os.path.join(remote_upload_target_dir, filename)
    remote_module = os.path.join(remote_upload_target_dir, filename)
    remote.upload(tmp_dir.relpath(filename), target=remote_target)
    func = remote.load_module(remote_module)
else:
    # original code
    remote.upload(tmp_dir.relpath(filename))
    func = remote.load_module(filename)
....

Thanks for your suggestion! @merrymercy