hi @zpu , some related discussions: Quantized models are slower than float models on GPUs - Questions - Apache TVM Discuss