Need help with : AttributeError: Can't pickle local object '_serving.<locals>._serve_loop'

aarohanverma · August 29, 2023, 6:03am

---------------------------------------Update on the issue------------------------------------- I had an older build of tvm on my system. And the paths were set to that build. However, upon running the build through async server, it would clone the latest version and the paths would get set to latest build for that terminal. The lastest build has _serve_loop in server.py as local object and hence is unable to get pickled. This is not the case in older build. I request the developers to look into it.

Some Context: I am working on a personal project that involves an async server (asyncio library) that listens to client requests. The clients can request for autotuning from server, the commands for which are executed by async server. Currently I am testing everything on localhost with target device (client) connected via LAN cable.

After successfully building tvm on client device, setting up tvm python paths and all the other necessary steps, the commands are executed in the asyncio server script in the given order:

Async Server Side: python3 -m tvm.exec.rpc_tracker --host=192.168.55.2 --port=9101
Client Side: python3 -m tvm.exec.rpc_server --tracker=192.168.55.2:9101 --key=abc --no-fork
Async Server Side: python3 RPC_autotune.py --key=abc --port=9101

I would like to bring it to attention that these commands in the given order work perfectly as intended when run through command line manually. They also work as intended when I execute them using subprocess.Popen manually. However, when executed within the async server script using asyncio.to_thread and subprocess.Popen or asyncio.create_subprocess_shell, the server does connect to the tracker initially as follows:

abc@xyz:~$ python3 -m tvm.exec.query_rpc_tracker --host 192.168.55.2 --port 9101
Tracker address 192.168.55.2:9101

    Server List
    -------------------------------------
    server-address           key
    -------------------------------------
    192.168.55.1:9090        abc
    -------------------------------------

    Queue Status
    -------------------------------------------
    key           total  free  pending
    -------------------------------------------
    abc             1      1      0
    -------------------------------------------

But when it gets ready to start autotuning, the client device disappears from Server List and goes into a pending state as follows:

abc@xyz:~$ python3 -m tvm.exec.query_rpc_tracker --host 192.168.55.2 --port 9101
Tracker address 192.168.55.2:9101

    Server List
    -------------------------------------
    server-address           key
    -------------------------------------
    -------------------------------------

    Queue Status
    -------------------------------------------
    key           total  free  pending
    -------------------------------------------
    abc             0      0      2
    -------------------------------------------

Upon further logging on the client device, the following error was reported:

nvidia@ubuntu:~$ cat log_file.log
2023-05-23 16:12:14.665 INFO bind to 0.0.0.0:9090
2023-05-23 16:12:42.234 INFO connected from ('192.168.55.2', 38360)
2023-05-23 16:12:42.251 INFO start serving at /tmp/tmpy23kstl4
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/nvidia/tvm_python/tvm/rpc/server.py", line 272, in _listen_loop
    _serving(conn, addr, opts, load_library)
  File "/home/nvidia/tvm_python/tvm/rpc/server.py", line 143, in _serving
    server_proc.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object '_serving.<locals>._serve_loop'

This issue is only faced while trying to execute the mentioned commands within the async server and not via command line. It would be helpful if someone could suggest me a workaround or a way to fix this error. Thanks.

aarohanverma · August 29, 2023, 8:54am

@junrushao @masahi @thierry @jwfromm Hi guys, can you please help me out

aarohanverma · August 29, 2023, 9:23am

/home/nvidia/tvm_python/tvm/rpc/server.py:

Error is raised for the following snippet:

def _serving(sock, addr, opts, load_library):
    logger.info(f"connected from {addr}")
    work_path = utils.tempdir()
    old_cwd = os.getcwd()
    os.chdir(work_path.path)  # Avoiding file name conflict between sessions.
    logger.info(f"start serving at {work_path.path}")

    def _serve_loop():
        _server_env(load_library, work_path)
        _ffi_api.ServerLoop(sock.fileno())

    server_proc = multiprocessing.Process(target=_serve_loop)
    server_proc.start()
    server_proc.join(opts.get("timeout", None))  # Wait until finish or timeout.

It is unable to pickle since _serve_loop is local. It tried redefining the function outside but upon running the scripts again, a new file is generated. Moreover, the older tvm version had this function as global.

aarohanverma · August 29, 2023, 10:21am

@Johnson9009 [RPC] Report RPC Session Timeout to Client Instead of “kShutdown” commit on July 2 has made _serve_loop local. This throws off an exception. I request you to look into it.

samwyi · September 21, 2023, 5:18pm

I meet the same issue. Could it be related to python version (therefore pickle version) ? I’m using 3.8.10

aarohanverma · December 11, 2023, 4:17am

Hi, I haven’t been able to resolve the issue yet. Please let me know if you have found a workaround?