RPC AutoScheduling keeps timeout

BitCircuit · December 7, 2023, 12:42pm

I am trying to tune a model using autoschedule with RPC for a Jetson NX board on a x86 PC with RTX 3080.

This is what the board keeps reporting:

2023-12-06 17:58:57.199 INFO connected from ('[x86 PC IP]', 35042)
2023-12-06 17:58:57.203 INFO start serving at /tmp/tmpb2upll9w
2023-12-06 17:59:07.223 INFO timeout in RPC session, kill..
2023-12-06 17:59:07.280 INFO finish serving ('[x86 PC IP]', 35042)
2023-12-06 17:59:07.375 INFO connected from ('[x86 PC IP]', 40966)
2023-12-06 17:59:07.380 INFO start serving at /tmp/tmplh201tek
2023-12-06 17:59:08.006 INFO finish serving ('[x86 PC IP]', 40966)
2023-12-06 17:59:08.102 INFO connected from ('[x86 PC IP]', 40978)
2023-12-06 17:59:08.108 INFO start serving at /tmp/tmp0h0uxuzy
2023-12-06 17:59:18.139 INFO timeout in RPC session, kill..
2023-12-06 17:59:18.212 INFO finish serving ('[x86 PC IP]', 40978)

With timeout set to 60 seconds, the RPC session still times out.

The way I set up the RPC system:

Run a tracker on x86 PC by executing python3 -m tvm.exec.rpc_tracker --host=0.0.0.0 --port=9190
Run a server on the board by executing python3 -m tvm.exec.rpc_server --tracker=[x86 PC IP]:9190 --key=jetson
I can see the jetson board by executing python3 -m tvm.exec.query_rpc_tracker --host=0.0.0.0 --port=9190 on x86 PC
So I run my autoschedule script with code like this:

mod, params = relay.frontend.from_onnx(...)
target = tvm.target.cuda(arch="sm_72")      # 72 For NX
tasks, task_weights = auto_scheduler.extract_tasks(
   mod["main"], target=target, params=params,)
tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
tune_option = auto_scheduler.TuningOptions(
     num_measure_trials=1000,
     runner=auto_scheduler.RPCRunner(
          key='jetson', host='127.0.0.1', port='9190', number=10, timeout=10),
     measure_callbacks=[auto_scheduler.RecordToFile(logPath)], )
tuner.tune(tune_option)

The TVM version: 0.15.dev0

I could tune the model with CUDA on board locally, but it is too slow. With RPC, it keeps saying timeout. Any suggestions?

mshr-h · December 13, 2023, 8:18am

     runner=auto_scheduler.RPCRunner(
          key='jetson', host='127.0.0.1', port='9190', number=10, timeout=10),

Can you try to set the host to “0.0.0.0” instead of “127.0.0.1”?

pfk123 · December 15, 2023, 10:37am

Hello @BitCircuit , I’m not an expert, but you can check few things:

On jetson device, check if you have permissions to write files where you run rpc_server? rpc_server should have argument to specify workdir.
In TVM/python/tvm/rpc/server.py, your error is followed by this message: f'RPCSessionTimeoutError: Your {opts["timeout"]}s session has expired, 'try to increase the "session_timeout" value.
In TVM there are many ways to get logs/increase verbosity. You can try to use one of them

BitCircuit · December 15, 2023, 11:51am

Thank you for your reply. After trying it, RPC session still keeps timeout (I tried to increase timeout limit to 120 seconds, same problem)

BitCircuit · December 15, 2023, 12:14pm

Thank you for your reply and advises. I have checked:

I went through the python script tvm/python/tvm/exec/rpc_server.py at main · apache/tvm (github.com), I did not find any arguments related to specify work directory. I run rpc_server in home directory which I should have permissions.
I tried to increase the timeout value to 120 seconds, same problem.
I tried to change https://github.com/apache/tvm/blob/b3eec91ee6254b40920c40e922cb3c37ac1c06a4/python/tvm/exec/rpc_server.py#L96C44-L96C44 from INFO to DEBUG, nothing extra has been printed.

pfk123 · December 15, 2023, 12:27pm

I was thinking about c++ version of RPC. But now it does not matter if you run rpc_server in home directory.
How do you setup TVM in Jetson board? Is it whole TVM, not only tvm_runtime.so, right? I think jetson is too weak platform to quickly compile modules, so you may need to increase timeout even more. Or use cross-compilation (compile on your x86 server to run on target arm jetson).

BitCircuit · December 20, 2023, 3:17am

Yes, it is whole TVM. Since I could not RPC to the board (I think RPC is the only way to cross-compile, right?), I run the AutoSchedule on board. The time cost is pretty high. For a kinda simple model with 1x3x1024x600 input size, the time costed by autoschedule is roughly 4 hrs for 1000 trials and 56 hrs for 20000 trials.

As of increasing timeout, I tried 30 mins, RPC sessions still time out. However, when I do auto-schedule on board, measurement stage takes ~120 seconds. If nothing wrong with my way setting up the RPC system, I suspect there may some bugs in RPC module.