Dear tvm community,
I am trying to follow the VTA auto-tuning tutorial, using the master as of 7 Oct. (76c2392).
I am facing the following issue:
-
I have two PYNQs. When I try to use only one PYNQ, I get an error that my device is not tracked:
Extracted 10 conv2d tasks: (1, 14, 14, 256, 512, 1, 1, 0, 0, 2, 2) (1, 28, 28, 128, 256, 1, 1, 0, 0, 2, 2) (1, 56, 56, 64, 128, 1, 1, 0, 0, 2, 2) (1, 56, 56, 64, 64, 3, 3, 1, 1, 1, 1) (1, 28, 28, 128, 128, 3, 3, 1, 1, 1, 1) (1, 56, 56, 64, 128, 3, 3, 1, 1, 2, 2) (1, 14, 14, 256, 256, 3, 3, 1, 1, 1, 1) (1, 28, 28, 128, 256, 3, 3, 1, 1, 2, 2) (1, 7, 7, 512, 512, 3, 3, 1, 1, 1, 1) (1, 14, 14, 256, 512, 3, 3, 1, 1, 2, 2) Tuning... [Task 1/10] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (0/1000) | 0.00 sTraceback (most recent call last): File "tutorials/autotvm/tune_relay_vta.py", line 424, in <module> tune_and_evaluate(tuning_option) File "tutorials/autotvm/tune_relay_vta.py", line 381, in tune_and_evaluate tune_tasks(tasks, **tuning_opt) File "tutorials/autotvm/tune_relay_vta.py", line 285, in tune_tasks autotvm.callback.log_to_file(tmp_log_file)]) File "/home/did/tvm/python/tvm/autotvm/tuner/tuner.py", line 108, in tune measure_batch = create_measure_batch(self.task, measure_option) File "/home/did/tvm/python/tvm/autotvm/measure/measure.py", line 252, in create_measure_batch attach_objects = runner.set_task(task) File "/home/did/tvm/python/tvm/autotvm/measure/measure_methods.py", line 211, in set_task raise RuntimeError("Cannot get remote devices from the tracker. " RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by 'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.
However I have successfully verified that the device is tracked :
python3 -m tvm.exec.query_rpc_tracker --host=0.0.0.0 --port=9190
Tracker address 0.0.0.0:9190 Server List ---------------------------- server-address key ---------------------------- 192.168.2.98:44802 server:pynq ---------------------------- Queue Status ---------------------------- key total free pending ---------------------------- pynq 1 1 0 ----------------------------
… and I have verified that I can execute the basic conv2d in both PYNQs (so the end-to-end flow works)
python tvm/vta/tests/python/integration/test_benchmark_topi_conv2d.py
When I use both PYNQs, then I am able to run the tutorial* (i.e. I am not getting the previous error). However it is weird that in the first autotuning phase, most of the work is taking place in one PYNQ and the other is idle. After a timeout (which happens after 1 hour) I am getting this report on the ‘idle’ ZYNQ:
xilinx@pynq:~/tvm$ sudo ./apps/vta_rpc/start_rpc_server_to_tracker.py INFO:RPCServer:bind to 0.0.0.0:9091 INFO:RPCServer:connection from ('192.168.2.1', 43478) INFO:root:Skip reconfig_runtime due to same config. INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit INFO:RPCServer:Timeout in RPC session, kill.. INFO:RPCServer:connection from ('192.168.2.1', 56800) INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit INFO:root:Skip reconfig_runtime due to same config. INFO:root:Loading VTA library: /home/xilinx/tvm/vta/python/vta/../../../build/libvta.so INFO:RPCServer:load_module /tmp/tmpwl6adhvf/tmp_func_1dbb6a8d7de86cfb.tar INFO:RPCServer:Finish serving ('192.168.2.1', 56800) INFO:RPCServer:connection from ('192.168.2.1', 56824) INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit ... and a long log of same lines continues ....
I am summarizing my questions:
- When I use multiple PYNQs, do I need to declare, in PYNQs or in host, any environment variables (or other configuration) or just registering the PYNQs to the host/tracker with this command is ok :
xilinx@pynq:~/tvm$ sudo ./apps/vta_rpc/start_rpc_server_to_tracker.py
? (I have changed the ip in this file for the host) - Why when I use two PYNQs the tutorial succeeds* (even with this initial 1-hour phase that one PYNQ is idle), while when I use only one PYNQ (tested with both), the tutorial fails?
- When I use multiple PYNQs, do I need to change any configuration on the PYNQs, apart from their IP, e.g. hostname, or just flashing them with 2.4 image, is ok to go.
Kind regards,
Dionysios
*When I use two PYNQs I have an issue that the evaluation of the tuned network fails, which is described here.