I am investigating the implementation of VTA because I want to use a neural network structure other than resnet or to make VTA available for Relay.
There was something I did not know about the pack processing mechanism, so please let me ask.
It would be greatly appreciated if there was a mistake.
Pack starts when starting node is specified by start_name, or when max_pool2d comes in. If start_name is not specified, what is the intention of starting pack when max_pool2d comes up rather than other operators?
If start_pack is not False, max_pool2d will generate an assert error.
In other words, unless max_pool2d comes first, must be a structure in which global_avg_pool2d is always used before?
I tried running yolo v2 graph by removing ‘assert not start_pack’, but when transpose operator comes after calling _pack_batch_channel, it could not be processed because the shape of data and the axis of transpose do not match.
Is it necessary to implement rewriting for transpose operator in pack?
As for the processing for conv2d, since counter is 0 start and the part where counter addition is performed is only in this if statement, can the condition of L282 - L291 pass? What is this counter intending?
I think most of the VTA example should be taken very carefully for other nets which are not resnets.
According to what I know, resnet architectures start with one conv2d-act-max_pool chain, then a chain of strided and unstrided conv2ds and residuals and lastly a global_avg_pool before the fully connected
The intention is to offload everything which comes before the first max_pool to the ARM processor (they mention something about it not having enough input channel dimensions for it to be offloaded to the FPGA part).
Again here, resnet architecture have by design a max_pool at the beginning and global_avg_pool at the end and only in that order.
Maybe you are packing an already packed tensor? IDK this might be a problem of using yolo instead of resnet for code which expects resnet architecture
Good question. It seems that that part of the code is unreachable for an initial counter=0
In this network showing below, transpose is used after max_pool2d.
Since _pack_batch_channel is called in max_pool2d, the transpose input will be 6 dimensions of data.
I think it needs to call _unpack_batch_channel after max_pool2d or to handle axis with 6 according to the input dimension. Is this correct?
In TinyYolo v2, the number of output channels is output as 125 at the last conv2d, but in the processing of _pack_weight, it is assumed that the channel size is divisible by cfactor, so it seems necessary to handle it.
As a way to cope,
Increase the number of output channels to be divisible
Also got an error at line assert dshape % cfactor == 0 of _pack_weight with a MobileNetV2-based NN made with PyTorch. When playing with start_name and stop_name and their indexes, it appears that some layers have Tensor sizes of [a, 1, b, c], causing the assert to fail. Those layers seem to be the bottleneck layers’ middle convolution layers…
Would implementing a assert dshape % cfactor != 0 case really be the only way to fix this issue?
Has anyone else tried MobileNetV2 with a VTA graph_pack by any chance?
The problem raised by @KZamudio is solved. The 1 in the tensor shape [a, 1, b, c] corresponds to the depthwise convolution (convolution is only computed on 1 channel at a time). The solution was to transform the depthwise convolution into a grouped convolution with groups of size 16 as VTA only supports convolutions with a number of channels which is a multiple of 16.