Thanks for your reply. I just tried this method recently but got a few more problems now.
-
I launched the auto-tune script in one of my computing server with 8 titan-v gpus. I found that all the cpu cores are utilized while the gpu devices are almost not used at all during the entire tuning process.
-
I launched the auto-tune script in my master node (i.e. server A). There are 3 other servers registered as computing node (
python -m tvm.exec.rpc_server --tracker=${HOST_IP}:9190 --key=titanv). This time, cpu utilization=100%, gpu utilization=0% in master node. cpu utilization=0%, gpu utilization=0% in all the 3 computing nodes. -
There are some timeout error messages displayed in both of the two experiments above.
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44906)
INFO:RPCServer:load_module /tmp/tmp80qs9j9c/tmp_func_3495b12be1f98a20.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44930)
INFO:RPCServer:load_module /tmp/tmp5sgzmkfz/tmp_func_ad070c06fc932366.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 44954)
INFO:RPCServer:load_module /tmp/tmpr_a7v68l/tmp_func_29ffa46b8cce806d.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45010)
INFO:RPCServer:load_module /tmp/tmplyvzz6fl/tmp_func_9852b87c445c798.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45116)
INFO:RPCServer:load_module /tmp/tmpir7h6rw0/tmp_func_22ec387d451f013d.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45142)
INFO:RPCServer:load_module /tmp/tmp0wfbyz3h/tmp_func_46d1b12b7dc698e1.tar
INFO:RPCServer:Timeout in RPC session, kill..
INFO:RPCServer:connection from ('172.16.18.5', 45164)