Hi @thierry,
Currently I am working for a patch to make Ultra96 work with software cache coherent, with this patch the performance can reach 89ms with yolov3-tiny, but it have a logic issue and seems like caused by ‘compute’ module ‘acc_mem’, the writing by 'write_tensor ’ function to acc_mem seems like have couple clock delay , after call 'write_tensor ’ if we add some logic to wait couple clock, the read_tensor of next round would can get correct content, but if using default logic, the ‘read_tensor’ would get old value, this issue not happen in Pynq board.
do you have any idea, how that happen? any clue would be very helpful.
Regards Hua