Don’t know if you are interested in backend C++ implementation or just how the process works.
The basic logic is as follow:
From begin, no pass is quantized yet;
Go through the layers:
- Determine if the current layer/operator supports quantize (in realize.cc).
if supports
if quantization started: quantize the current layer(_annotation.py);
If not: start quantize, quantize the current layer;
if not
if quantization started: De-quantization before the current layer, stop quantization
If not start quantize: do nothing