Quantizatition and pruned model

You can add your own legalize strategy, refer to this PR:

But I’m not sure if you will really get speed up after padding. So you can also consider directly modifying the cuda strategy of your own conv2d_int8 so that it can be distributed to topi that does not require dp4a.