Quantizatition and pruned model

Hi TVM community,

I am facing the following problem: I have pruned a 2D model and now I want to use TVM quantization. Since int8 quantization takes advantage of the dp4a primitive, the workload should be divisible by ic_block_factor which is 4. However, my network is pruned and the channels are no longer divisible by 4, which results in an error

I would like to pad the input weight channel/kernel. However, I am not really familiar with how TVM implements its kernel.

Can you advise me an approach to solve my problem?

1 Like

You can add your own legalize strategy, refer to this PR:

But I’m not sure if you will really get speed up after padding. So you can also consider directly modifying the cuda strategy of your own conv2d_int8 so that it can be distributed to topi that does not require dp4a.