Will TVM’s current quantization support the following situation? If so, where can I find more info?
Input model is float32
All weights must be transformed into int8 fixed point format
Fixed point parameters can be shared within a tensor, but do not need to be the same from tensor-to-tensor
Data-aware calibration needs to find min/max values of accumulators and pass this info from relay down to code generation (what is the mechanism here?)
I’m working on the tutorial and it will be available in one or two weeks.
For your questions,
1st and 2nd are supported. Fixed point parameters are shared within a tensor (we use shared parameter + int tensor representation).