[Quantization][AutoTVM] Performance degradation of Quantization + AutoTVM vs FP on x86

tico · August 7, 2019, 11:59am

Hi,

I am trying to quantize and tune some TF models on x86. However, the performance results are extremely poor compare with the non-quantize version. The numbers are as follows:

First model
TVM FP32: 35.05ms
TVM int8 quantization: 80.ms
TVM int8 quantization + AutoTVM: 46.87ms
Second model
TVM FP32: 72.85ms
TVM int8 quantization: 159.33ms
TVM int8 quantization + AutoTVM: 112.39ms

What is the reason for such a bad performance? What can be done to try to improve performance?

@vinx13 Any ideas?

vinx13 · August 7, 2019, 1:41pm

I would suggest comparing performance of conv2d layer by layer to see if we can improve current int8 conv2d implementation. We can also check if the fusion result (after FuseOps pass) is optimal.

tico · August 7, 2019, 3:52pm

Thanks for the suggestions. I will compare every conv2d performance with the TVM profiler. Regarding the fusion result, how can we verify if it is optimal?

vinx13 · August 7, 2019, 4:17pm

you can check if there are anything fusible that are not fused

tico · August 8, 2019, 12:15pm

@vinx13

After checking the TVM profiler and compare with and without quantization it is clear that in the quantized version the fused convolutions are slower. Actually, are twice as slow as in FP.

Also I see that because of the data layout used, the added transposed operators are not quantized, which means that before and after every convolution there is the translation from INT fo FP. This of course adds a lot of overhead.

Do you have any suggestions or thoughts about this?

Thanks

vinx13 · August 8, 2019, 3:15pm

If you set all scales to be power-of-2, cast from int to fp can be avoided.

Re the performance degradation, @kevinthesun @janimesh might have idea?

tico · August 8, 2019, 3:25pm

You mean setting this parameter like this: global_scale=8.0,? Or what do you mean by scales?

vinx13 · August 8, 2019, 3:44pm

global_scale=8.0 or your custom scales in calibration

tico · August 8, 2019, 4:43pm

Ok and how go you set the custom scales during calibration?

vinx13 · August 8, 2019, 5:25pm

calibrate accepts scales argument, you can call calibrate to set the scales