VTA can’t get correct in Zynq Ultra Scale based device. We once thought this was a coherency problem, but recently I found there may be a bug in HLS C VTA hardware with my partner @hht.
Only FINISH instruction can set VTA_COMPUTE_DONE register in compute module. And only GEMM/ALU instruction will reset VTA_COMPUTE_DONE register, while LOAD/STORE can’t.
The last instruction in instruction queue is FINISH. The processor queries VTA_COMPUTE_DONE register to confirm that the VTA compute is complete. But VTA_COMPUTE_DONE remains set until VTA runs again and excutes a GEMM/ALU instruction. If processor queries the register before VTA excutes a GEMM/ALU instruction, it will misunderstand that the VTA compute is completed.
I modified the source code of compute module IP to make VTA_COMPUTE_DONE register clear on read.
reg [1:0] rise_done_buf; always @(posedge ACLK) begin if(ARESET) rise_done_buf <= 2'b0; else if (ACLK_EN) begin rise_done_buf <= done_o; rise_done_buf <= rise_done_buf; end end wire rise_done = rise_done_buf & (~rise_done_buf); reg [31:0] int_done_tmp; always @(posedge ACLK) begin if (ARESET) int_done_tmp <= 32'b0; else if (ACLK_EN) begin if (rise_done) int_done_tmp <= 32'b1; else if (ar_hs && raddr == ADDR_DONE_O_DATA_0) int_done_tmp <= 32'b0; // clear on read end end
After modifying, I get correct results on ZCU104 platform.