RPC server hangs when trying to program de10

mrlebedev · February 10, 2021, 3:36pm

Hello,
I am trying to use VTA on a DE10-Standard board. Everything was fine until I tried to program the board with my bitstream. VTA runtime was built successfully. Now I start the RPC server, as in the tutorial, establish a connection between my host machine and the board, reconfigure runtime. Then the server starts the program_fpga method and hangs completely when returning from load_vta_dll method in vta/python/vta/exec/rpc_server.py. I have only to reset the board after it.
I thought that something was wrong with the libvta.so library and it couldn’t be loaded. There was an undefined symbol from the libtvm_runtime.so so I linked this library to libvta.so. But it didn’t help. I have tried different Python versions. It helped neither.

My script:

env = vta.get_env()
device_host = os.environ.get("VTA_RPC_HOST", "<board-ip>")
device_port = os.environ.get("VTA_RPC_PORT", "9091")
remote = rpc.connect(device_host, int(device_port))
vta.reconfig_runtime(remote)
vta.program_fpga(remote, bitstream="<path-to-bitstream>/1x16_i8w8a32_15_15_18_17.rbf")

Board RPC server output:

root@DE10-Standard: tvm/apps/vta_rpc/start_rpc_server.sh 
INFO:RPCServer:bind to 0.0.0.0:9091
INFO:RPCServer:connection from ('<host-ip>', 53684)
INFO:root:Loading VTA library: /home/root/tvm/vta/python/vta/../../../build/libvta.so

I don’t think that this happens because I use the DE10-Standard board instead of the DE10-Nano. The runtime doesn’t even start to execute platform-dependent code.

Has anyone encountered such a problem? Can it be because of using Python’s multiprocessing modules? Can I bypass it by programming my board using Quartus toolchain and not executing program_fpga (but I think that the server will also hang trying to load libvta.so from some other place)?

mrlebedev · March 9, 2021, 2:04pm

The problem was in the inappropriate version of Linux I used on my de10-standard board. It supported graphical desktop, and some of its functionality was programmed in the FPGA. So Linux just crashed when I tried to reconfigure the FPGA.
However, console Angstrom Linux by Terasic for de10-standard appeared to be too old to run VTA, so I built a custom version of Linux based on altera-socfpga kernel v. 5.9.0 and Ubuntu 18.04 rootfs.
Now the FPGA configures successfully, but I still experience issues with the CMA driver:

There is no link to the driver in the VTA installation guide. I accidentally found Rihards Novickis CMA driver (which seems to be the right driver) in liangfu’s github repository. Before it I tried to use ikwzm/udmabuf CMA driver. Unsuccessfully.
I have configured CMA support in the kernel and tried to reserve memory using the device tree (as “shared-dma-pool” and “linux,cma-default”) and built-in kernel abilities (default allocator). Both variants successfully reserve the memory region, the CMA driver registers a /dev/cma device, but I can’t access this memory because of Unable to handle kernel NULL pointer dereference at virtual address 00000168 when sending request via ioctl.

My guess is that the driver is not configured properly, but I am not a Linux driver expert. Or maybe I need to use an older Linux kernel?

mrlebedev · March 11, 2021, 9:00am

Using Linux kernel 4.14 helped. VTA seems to work: ‘Get started’ tutorial succeeded, but I had to reduce vector size from 64 to 8, because 128 Mb of cma memory was not enough for this test.

torontotong · June 23, 2021, 5:54pm

Hi mrlebedev,

Thank you for putting your experience on this issue. I am facing the same issue on my development. I have a question regarding your latest thread. You mentioned that Linux kernel 4.14 worked for you, Did you just replaced 4.14 kernel image then everything works? i.e. “handle kernel NULL pointer dereference at virtual address 00000168” issue gone automatically? Have you changed CMA driver as well? Did you change device tree configuration ? I tried with kernel 4.14, but CMA driver still complaints handle kernel NULL pointer…

If you could explain more details about how you get VTA worked, That will be highly appreciated!

mrlebedev · June 24, 2021, 10:37am

Hi torontotong,

I’ve built kernel 4.14 from GitHub - altera-opensource/linux-socfpga: Linux development repository for socfpga, branch rel_socfpga-4.14.130-ltsi_21.02.02_pr, using gcc for x86_64_arm-linux-gnueabihf and the following commands:

make ARCH=arm socfpga_defconfig
make ARCH=arm menuconfig

In menuconfig I’ve enabled:
Kernel Features -> Contiguous Memory Allocator,
Device Drivers -> Generic Driver Options -> DMA Contiguous Memory Allocator (there you can also set the desired amount of CMA memory)
Enable the block layer -> Support for large (2TB+) block devices and files
And disabled:
General setup -> Automatically append version information to the version string
Then:

make ARCH=arm LOCALVERSION= zImage -j4 modules
...
cp arch/arm/boot/zImage <path-to-your-sd-card-boot-partition>

I have used the original device tree from my board. Adding “shared-dma-pool” and “linux,cma-default” didn’t work.
Then I’ve built and used the CMA driver from Files · master · Rihards Novickis / FPSoC_Linux_drivers · GitLab
Maybe I have forgotten some details, but in general this instruction made VTA work. I hope this will help!

torontotong · June 25, 2021, 4:23am

Hi Mikhail,

Thank you very much for your replying. They are very useful. From your reference I could verify my approach is similar as yours. I am using the same CMA driver source code from gitlab repository, and noticed your system is 32 bits, and my board is running stratix 10 which is 64 bits. I actually modified CMA driver to support 64 bits architecture. However, I had a interesting issue, that VTA runtime VTAMemAlloc() could allocate 4KB, 64KB, buffers, but failed on allocating 32MB buffer when host side app is executing f() function. I am dealing with this issue now, I guess somewhere in the system has a limitation for the maximum buffer size, that’s why I ask whether you changed devicetree.

Anyway, Thanks for your Help!

xwrock · December 2, 2022, 4:07am

The CMA driver repo is inaccessible. How can I get this?