Int8 quantization on ViT, especially Layernorm

Longday0923 · January 11, 2022, 9:26am

Hi everyone! I’m new to TVM and quantization.

Currently I’m struggling with a problem about quantization on vision transformer, especially the layernorm module. When I did experiements with FakeQuant(academic simulation), I found the input feature of Layernorm(also the output feature of skip-connection) is very sensitive to quantization(Using Post Training Quantization with small calibration set) with 8 bit-width. Accuracy degration could be found on all sorts of ViTs(or DeiTs).

I noticed that somebody has already test TVM+ViTs, but accuracy, as well as Layernorm module are not mentioned, which confused me more.

I’m wondering that how does TVM support Int8 Inference on ViT. Specifically are there any special configuration on quantizing Layernorm or skip-connection?

Thanks!

youxiudeshouyeren · October 19, 2022, 12:11pm

Hi, I am experiencing quantization errors when import ViT using relay. I am currently unable to start running the model. Can you share your code for importing ViT, e.g. github?Thanks a lot!

SamMichaelson · June 27, 2024, 3:46am

Just found out this guy that could help GitHub - zkkli/I-ViT: [ICCV 2023] I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference