I understand that the dependency queue is used to convey token indicating the dependencies of each module. When there is an instruction sequence as below, does INSTRUCTION 1 continue to push to the g2l dep queue during the iteration (16 clocks?), and when the GEMM iteration is over and the g2l dep queue becomes empty, does the task of INSTRUCTION 2 execute?
INSTRUCTION 0: LOAD UOP
dep - pop prev: 0, pop next: 0, push prev: 0, push next: 0
DRAM: 0x05a40000, SRAM:0x0000
y: size=1, pad=[0, 0]
x: size=1, stride=1, pad=[0, 0]
l2g_queue = 0, g2l_queue = 0
s2g_queue = 0, g2s_queue = 0
INSTRUCTION 1: GEMM
dep - pop prev: 0, pop next: 0, push prev: 1, push next: 0
reset_out: 1
range (0, 1)
outer loop - iter: 16, wgt: 0, inp: 0, acc: 1
inner loop - iter: 1, wgt: 0, inp: 0, acc: 0
l2g_queue = 0, g2l_queue = 1
s2g_queue = 0, g2s_queue = 0
INSTRUCTION 2: LOAD INP
dep - pop prev: 0, pop next: 1, push prev: 0, push next: 0
DRAM: 0x01684800, SRAM:0x0000
y: size=1, pad=[0, 0]
x: size=1, stride=1, pad=[0, 0]
l2g_queue = 0, g2l_queue = 0
s2g_queue = 0, g2s_queue = 0
INSTRUCTION 3: LOAD WGT
dep - pop prev: 0, pop next: 0, push prev: 0, push next: 1
DRAM: 0x00168500, SRAM:0x0000
y: size=16, pad=[0, 0]
x: size=1, stride=16, pad=[0, 0]
l2g_queue = 1, g2l_queue = 0
s2g_queue = 0, g2s_queue = 0
INSTRUCTION 4: LOAD UOP
dep - pop prev: 1, pop next: 0, push prev: 0, push next: 0
DRAM: 0x05a40001, SRAM:0x0001
y: size=1, pad=[0, 0]
x: size=1, stride=1, pad=[0, 0]
l2g_queue = 0, g2l_queue = 0
s2g_queue = 0, g2s_queue = 0
INSTRUCTION 5: GEMM
dep - pop prev: 0, pop next: 0, push prev: 1, push next: 0
reset_out: 0
range (1, 2)
outer loop - iter: 16, wgt: 1, inp: 0, acc: 1
inner loop - iter: 1, wgt: 0, inp: 0, acc: 0
l2g_queue = 0, g2l_queue = 1
s2g_queue = 0, g2s_queue = 0