Hi, I’m about to profile a deep neural network using the vta architecture. What I’m curious about is how images that go into the neural network are delivered on the target board. For example, let’s say there are two convolutional operations. The VTA architecture will break it down into small pieces and calculate it. CONV1 = conv1_1 + conv1_2 + conv1_3 +… CONV2 = conv2_1 + conv2_2 + conv2_3 + …
Then, at the time of the split convolution operation, input images on the HostPC will be transferred to the drams on the vta architecture for each operation. It seems that it will be imported into the sram, operated, saved to the drama, and sent back to the HostPC. I want to measure this overhead, and I wonder which function is performing this process.