According to the comment in include/vta/hw_spec.h
and the hardware implementation (simulation and xilinx, etc.), if use_imm
is True
, the input index will be the same as the output index, i.e., both would be dst_idx
.
// include/vta/hw_spec.h
341 * // Perform ALU operation
342 * if (use_imm) {
343 * acc_mem[dst_idx] = alu_op(alu_opcode, acc_mem[dst_idx], imm);
344 * } else {
345 * acc_mem[dst_idx] = alu_op(alu_opcode, acc_mem[dst_idx], acc_mem[src_idx]);
346 * }
However, I think this is not true in some cases, e.g., when the input variable may be referenced more than once. In this case the input variable should be kept and will be further used.
For example, if I have the following fused operation to run on VTA:
1 b = right_shift(a, 5) // use_imm = True; a@acc_mem[idx_0], b@acc_mem[idx_0]
2 c = mul(b, 5) // use_imm = True; c@acc_mem[idx_1]
3 d = max(b, c) // use_imm = False; d@acc_mem[idx_0]
In line 2, even though use_imm
= True, the indexes of input (i.e., ‘b’) and output (i.e., ‘c’) have to be different, because the max
op in line 3 references both b
and c
. In other words, b
are referenced twice (line 2 and line 3), and line 2 cannot update b
's data in-place.
So in the above example, for use_imm
= True, we have to use src_idx
as the input.
For the existing codebase and the Resnet models, we do not have op sequences like the above. So if use_imm = True
, we will see dst_idx = src_idx, thus not triggering the bug. But for other customized models, it may be not the case.