Thanks for the Ultra96 patch, I just tried new VTA on Ultra96 V1 board with pynq for ultral96 2.4 image, somehow the new VTA does’t work and stucked, following are my steps for Ultra96 try, is any steps what I used wrong ?
Right now there is no coherent support on Ultra96 pynq image; as a result getting correct execution will require heterogeneous runtime support (which I’m working on next). I will keep you in the loop…
I did some trouble shooting for this cache coherent issue for ultra96, one approach is after set “kBufferCoherent” into “False” to force do software cache coherent, the GEMM compute by vta would work properly on ultra96, for resnet , it would not stuck anymore but the classify result is incorrect.
based on the said test result, seems like cache coherent part is ok, do you think that is possible the reset classify issue cause by TLPP or any other logic problem of vta ?
I tried VTA on ZCU104 board with pynq 2.4 image and built new bitstream. VTA didn’t stucked when run test_benchmark_gemm.py. But for resnet18_v1, the classify result is also incorrect.
@ffffc thanks for the update, you are right, the logic is correct,and the cache coherent is the problem, besides of disable data cache, manually do the software data flush/invalidate for all vta page can fix this issue too in my test case , but that would make performance really bad to compare no software sync scenario.
I tried from 2018.3 to 2019.2, but no other version than 2018.3 could complete bitstream generation.
I find test_benchmark_gemm.py completed and succeed on Ultra96-v2 + PYNQ v2.4/2.5 with cache coherency setting register modification. ( I wrote my environment on other thread)
But resnet18_v1 also outputs wrong result, and I think it depends on VTALoadBuffer2D()'s incorrectness on some Xilinx devices.
On my environment, PYNQ version(Ultra96v2 2.4/2.5) and bitstream (default / built from source with Vivado 2018.3) difference didn’t affect VTALoadBuffer2D()'s incorrectness.