With the latest master code and PR: https://github.com/dmlc/tvm/pull/1487. I follow the tutorial: https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html trying to tune it for remote device. Everything goes smoothly before tuning.
However, when I run tune_and_evaluate() to do the tuning. I find that
[Task 1/19] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (201/1000) | 499.30 s Done
I have tested RPC connect / status, which is OK.
We can get debug information by printing all the tuning results.
Put these lines in the beginning of your script
import logging
logging.basicConfig(level=logging.DEBUG)
remove the progress_bar callback function in this block
# do tuning
tuner_obj.tune(n_trial=min(n_trial, len(tsk.config_space)),
early_stopping=early_stopping,
measure_option=measure_option,
callbacks=[
## REMOVE THIS LINE ## autotvm.callback.progress_bar(n_trial, prefix=prefix),
autotvm.callback.log_to_file(tmp_log_file)])
Let me see your error?
result: MeasureResult(costs=(TVMError('Except caught from RPC call:
TVMCall CFunc Error:Traceback (most recent call last):\n
File “/data/wuzhao/tvm/tvm/_ffi/_ctypes/function.py”, line 54, in cfun rv = local_pyfunc(*pyargs)
File “/data/wuzhao/tvm/tvm/rpc/server.py”, line 48, in load_module\n m = _load_module(path)\n
File “/data/wuzhao/tvm/tvm/module.py”, line 222, in load\n return _LoadFromFile(path, fmt)\n
File “/data/wuzhao/tvm/tvm/_ffi/function.py”, line 280, in my_api_func\n return flocal(*args)\n
File “/data/wuzhao/tvm/tvm/_ffi/_ctypes/function.py”, line 184, in call\n ctypes.byref(ret_val), ctypes.byref(ret_tcode)))\n
File “/data/wuzhao/tvm/tvm/_ffi/base.py”, line 66, in check_call\n
raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm.ffi.base.TVMError: [11:05:47]
/home/wuzhao/third-party/github-tvm/tvm/src/runtime/dso_module.cc:93: Check failed: lib_handle != nullptr
Failed to load dynamic shared library /tmp/tmpu6lj43y3/tmp_func_9c228e3f2ff51f70.so /tmp/tmpu6lj43y3/tmp_func_9c228e3f2ff51f70.so: failed to map segment from shared object\n\nStack trace returned 9 entries
…
I guess the reason is here. I previously met one similar problem, then I change the upload function from
remote.upload(tmp.relpath(filename))
to ```remote.upload(tmp.relpath(filename), target=‘ONE_PATH’)
And durting the tunning, could I set the tunning’s upload directory like remote.upload(target)?
@merrymercy I solved this problem leveraging your DEBUG utility.
I let the measure_methods.py accepting one environment so that I can set the remote module dir. Do you mind we add this environment ? Here is my diff:
--- a/python/tvm/autotvm/measure/measure_methods.py
+++ b/python/tvm/autotvm/measure/measure_methods.py
@@ -317,8 +317,11 @@ def _measure_common(input_pack, build_func, build_kwargs, number, repeat,
# upload built module
if remote:
- remote.upload(tmp_dir.relpath(filename))
- func = remote.load_module(filename)
+ remote_upload_target_dir = os.environ.get('TVM_RPC_REMOTE_DIR')
+ remote_target = os.path.join(remote_upload_target_dir, filename) if remote_upload_target_dir else None
+ remote_module = os.path.join(remote_upload_target_dir, filename) if remote_upload_target_dir else filename
+ remote.upload(tmp_dir.relpath(filename), target=remote_target)
+ func = remote.load_module(remote_module)
ctx = remote.context(str(inp.target), 0)
time_f = func.time_evaluator(
func.entry_name, ctx, number=number, repeat=repeat)
Nice!
You can send the patch.
Better with
if 'TVM_RPC_REMOTE_RID' in os.environ:
# your code here
remote_upload_target_dir = os.environ.get('TVM_RPC_REMOTE_DIR')
remote_target = os.path.join(remote_upload_target_dir, filename)
remote_module = os.path.join(remote_upload_target_dir, filename)
remote.upload(tmp_dir.relpath(filename), target=remote_target)
func = remote.load_module(remote_module)
else:
# original code
remote.upload(tmp_dir.relpath(filename))
func = remote.load_module(filename)
....
Thanks for your suggestion! @merrymercy