Hi everyone! I’m new to TVM and quantization.
Currently I’m struggling with a problem about quantization on vision transformer, especially the layernorm module. When I did experiements with FakeQuant(academic simulation), I found the input feature of Layernorm(also the output feature of skip-connection) is very sensitive to quantization(Using Post Training Quantization with small calibration set) with 8 bit-width. Accuracy degration could be found on all sorts of ViTs(or DeiTs).
I noticed that somebody has already test TVM+ViTs, but accuracy, as well as Layernorm module are not mentioned, which confused me more.
I’m wondering that how does TVM support Int8 Inference on ViT. Specifically are there any special configuration on quantizing Layernorm or skip-connection?
Thanks!