[VTA]Ultra96 Stuck when try test_benchmark_gemm.py

Hi @thierry

Thanks for the Ultra96 patch, I just tried new VTA on Ultra96 V1 board with pynq for ultral96 2.4 image, somehow the new VTA does’t work and stucked, following are my steps for Ultra96 try, is any steps what I used wrong ?

Regards
Hua

Steps:

  1. get image from following link and flash into a SD card
    http://avnet.me/ultra96-pynq-image-v2.4

  2. connect ultra96 though usb-ethernet cable to internet.

  3. ssh login to ultra96, git clone get latest tvm

  4. change vta/config/vta_config.json target into “ultra96”

  5. build vta runtime by make ‘vta runtime -j2’

  6. launch vta rpc server
    sudo ./apps/vta_rpc/start_rpc_server.sh

  7. in host box, git clone latest tvm

  8. set VTA HOST with correct IP address and port

  9. run ./test_benchmark_gemm.py

  10. ultra96 output following log and stuck.
    INFO:root:Loading VTA library: /home/xilinx/tvm/vta/python/vta/…/…/…/build/libvta.so
    INFO:RPCServer:load_module /tmp/tmpjt4r7a6b/gemm.o

  11. in ultra96 , if change target into “sim”, the said test script work good.

3 Likes

Right now there is no coherent support on Ultra96 pynq image; as a result getting correct execution will require heterogeneous runtime support (which I’m working on next). I will keep you in the loop…

Hi @thierry,

thanks for the kindly help,

I did some trouble shooting for this cache coherent issue for ultra96, one approach is after set “kBufferCoherent” into “False” to force do software cache coherent, the GEMM compute by vta would work properly on ultra96, for resnet , it would not stuck anymore but the classify result is incorrect.
based on the said test result, seems like cache coherent part is ok, do you think that is possible the reset classify issue cause by TLPP or any other logic problem of vta ?

Regards
Hua

I tried VTA on ZCU104 board with pynq 2.4 image and built new bitstream. VTA didn’t stucked when run test_benchmark_gemm.py. But for resnet18_v1, the classify result is also incorrect.

I turned off multi-core and disabled data cache, the result became correct. So I think the logic of vta is right. @hjiang

Can you share what exactly did you do? I mean in an engineering perspective.

Disable cache

@ffffc thanks for the update, you are right, the logic is correct,and the cache coherent is the problem, besides of disable data cache, manually do the software data flush/invalidate for all vta page can fix this issue too in my test case , but that would make performance really bad to compare no software sync scenario.

May I ask which Vivado version are you using? I am working on ZCU102 and get stuck at GEMM example?

I tried from 2018.3 to 2019.2, but no other version than 2018.3 could complete bitstream generation.

I find test_benchmark_gemm.py completed and succeed on Ultra96-v2 + PYNQ v2.4/2.5 with cache coherency setting register modification. ( I wrote my environment on other thread)

But resnet18_v1 also outputs wrong result, and I think it depends on VTALoadBuffer2D()'s incorrectness on some Xilinx devices.

On my environment, PYNQ version(Ultra96v2 2.4/2.5) and bitstream (default / built from source with Vivado 2018.3) difference didn’t affect VTALoadBuffer2D()'s incorrectness.

I’m facing the same problem on the ultra96, were you able to find a solution for this?

Hi Jenst,

Thanks for following this topic, I am working for a solution and plan to post a PR soon to fix such issue, would let you know once the patch ready.

Regards

Hua

Has this problem been solved? I met the same problem in pynq Zu. My device is xczu5eg and ultra 96 is xczu3eg. They are very similar.