Hi TVM community,
I am facing the following problem: I have pruned a 2D model and now I want to use TVM quantization. Since int8 quantization takes advantage of the dp4a primitive, the workload should be divisible by ic_block_factor which is 4. However, my network is pruned and the channels are no longer divisible by 4, which results in an error
I would like to pad the input weight channel/kernel. However, I am not really familiar with how TVM implements its kernel.
Can you advise me an approach to solve my problem?