[VTA] acc_mem, write_tensor function have delay on ultra96

Hi @thierry,

Currently I am working for a patch to make Ultra96 work with software cache coherent, with this patch the performance can reach 89ms with yolov3-tiny, but it have a logic issue and seems like caused by ‘compute’ module ‘acc_mem’, the writing by 'write_tensor ’ function to acc_mem seems like have couple clock delay , after call 'write_tensor ’ if we add some logic to wait couple clock, the read_tensor of next round would can get correct content, but if using default logic, the ‘read_tensor’ would get old value, this issue not happen in Pynq board.

do you have any idea, how that happen? any clue would be very helpful.

Regards Hua

seems like this caused by inter DEPENDENCY setting as following.

578 #pragma HLS DEPENDENCE variable = acc_mem inter false

after remove this line, issue go away on ultra96. the said setting removed acc_mem dependency and that would cause read_tensor and write_tensor be parallel running and cause logic issue. but something still not very clearly

#1. this code have a comments about ‘necessary for II = 1’, if we need the dependency for acc_mem, is there still other way to reach II=1 or that is impossible?

#2. Pynq board running good, do not know why ?