Slower Execution Times After 8 bit Quantization?

sakura · August 14, 2023, 11:12am

I have encountered the same problem. When infering on the CPU, using the avx instruction can accelerate the quantization model, and the inference speed of the quantization model is faster than that of the floating-point model. However, using tvm quantization and avx instructions can cause other problems.