[VTA][ALU][Bug] if use_imm is True, which idx should be used? src_idx or dst_idx?

According to the comment in include/vta/hw_spec.h and the hardware implementation (simulation and xilinx, etc.), if use_imm is True, the input index will be the same as the output index, i.e., both would be dst_idx.

// include/vta/hw_spec.h

341 *           // Perform ALU operation                                                                                                                                                                            
342 *           if (use_imm) {                                                                                                                                                                                      
343 *             acc_mem[dst_idx] = alu_op(alu_opcode, acc_mem[dst_idx], imm);                                                                                                                                     
344 *           } else {                                                                                                                                                                                            
345 *             acc_mem[dst_idx] = alu_op(alu_opcode, acc_mem[dst_idx], acc_mem[src_idx]);                                                                                                                        
346 *           }  

However, I think this is not true in some cases, e.g., when the input variable may be referenced more than once. In this case the input variable should be kept and will be further used.

For example, if I have the following fused operation to run on VTA:

1  b = right_shift(a, 5)  // use_imm = True; a@acc_mem[idx_0], b@acc_mem[idx_0]
2  c = mul(b, 5)  // use_imm = True; c@acc_mem[idx_1]
3  d = max(b, c)  // use_imm = False; d@acc_mem[idx_0]

In line 2, even though use_imm = True, the indexes of input (i.e., ‘b’) and output (i.e., ‘c’) have to be different, because the max op in line 3 references both b and c. In other words, b are referenced twice (line 2 and line 3), and line 2 cannot update b's data in-place. So in the above example, for use_imm = True, we have to use src_idx as the input.

For the existing codebase and the Resnet models, we do not have op sequences like the above. So if use_imm = True, we will see dst_idx = src_idx, thus not triggering the bug. But for other customized models, it may be not the case.

How do you think? @thierry @liangfu and others?