to apply first conv2d layer into VTA, there are 2 solution/ steps, first is to padding first conv2d from 3 channel into VTA hardware match channel for example 16, after that we
would can run first quantized conv2d layer on VTA , for sure
the padding would increase compute OP number and impact the performance but that would can provide a baseline for next level perf optimization.
second solution is that for some non 1x1 kernel for example 3x3 kernel, provide special optimization, these optimization is that instead of doing traditional IMG2COL blocking, we can use every 3x3x3(27) data as the input data and do related padding, these would reduce the compute increase and can improve performance.
for the first #solution proposal that need to padding the input data layer from 3 to 16*n to match vta
hardware resource, for the padding part it would look like this PR https://github.com/apache/incubator-tvm/pull/4887, _const_shape_match is similar logic but it only do that for factor_out. if you have interest you can try some patch based on the said logic.
please kindly let me know if you have any better idea or any questions about the possible solutions.