Hi all!
I am trying to run the VTA on the ZCU111 Xilinx board. So far, I was able to generate the bitstream and compile the VTA runtime on the board, but need some ideas on how to debug the problem I found. I will detail my steps here:
- PYNQ image: 2.6
- I added a zcu111 configuration in /3rdparty/vta-hw/config/pkg_config.py:
...
elif self.TARGET == "zcu111":
self.fpga_device = "xczu28dr-ffvg1517-2-e"
self.fpga_family = "zynq-ultrascale+"
self.fpga_board = "xilinx.com:zcu111:part0"
self.fpga_board_rev = "1.4"
self.fpga_freq = 300
self.fpga_per = 2
self.fpga_log_axi_bus_width = 7
self.axi_prot_bits = '010'
# IP register address map
self.ip_reg_map_range = "0x1000"
self.fetch_base_addr = "0xA0000000"
self.load_base_addr = "0xA0001000"
self.compute_base_addr = "0xA0002000"
self.store_base_addr = "0xA0003000"
...
- I then followed Bitstream Generation with Xilinx Toolchains and was able to generate the bitstream without errors (I had to make some additional small changes to be able to generate it using Vivado 2020.2).
I have inspected the generated Vivado project, and verified the following:
- Timing was achieved correctly for a frequency of 300 MHz.
- I can see in the generated block design address ranges that the correct addresses configured in the Python script pkg_config.py are correctly configured in the AXI mapping between the Zynq and the VTA modules. Notice that this addresses are exactly the sames as the ultra96 target.
Then, I built the VTA runtime on the ZCU111 board, making sure that the vta_config.json file in the board is the same that was used in the host to generate the bitstream, but I changed the target to “ultra96”. I also made sure that the USE_VTA_FPGA option in the config.cmake file is activated. I followed this section. Build was successful, and I was able to start the RPC server in the board.
From the host computer, I tried to run the matrix_multiply.py tutorial.
- I changed the VTA_RPC_HOST line to add the specific IP of my board.
- I added one option in the if that is used to program the FPGA, in case the env.TARGET == “ultra96”.
- In vta.program_fpga, I added the path to my generated bitstream file.
When running the script on the host, the schedule is correctly compiled but the script freezes in line:
# Invoke the module to perform the computation
f(A_nd, B_nd, C_nd)
I can see the following output in the terminal where I started the rpc server in the FPGA board:
2020-10-19 19:48:33.142 INFO bind to 0.0.0.0:9091
2020-10-19 19:48:35.836 INFO connection from ('192.168.0.1', 45652)
INFO:root:Skip reconfig_runtime due to same config.
INFO:root:Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
INFO:root:Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt
INFO:root:Program FPGA with vta.bit
INFO:root:Loading VTA library: /home/xilinx/tvm/vta/python/vta/../../../build/libvta.so
2020-10-19 19:48:42.242 INFO load_module /tmp/tmpyy0qx27i/gemm.o
I also tried to run script vta/tests/python/integration/test_benchmark_topi_conv2d.py, and obtained a similar problem: all convolution measurements on the CPU worked fine, but when the VTA measurements started, the script freezes when executing the first one.
I interpret that the schedule is correctly cross-compiled and that the generated module is correctly loaded in the FPGA board. Are there more steps/flags available to try to debug this issue?
Small extra test: I found this post and this post stating that this could be a coherence problem. So I tried to generate the bitstream without coherence activated (there’s a coherence flag in pkg_config.py). After generating the new bitstream, I tried to run matrix_multiply.py and test_benchmark_topi_conv2d.py again, but I found the same problem.