Recently I’m working on model quantization.
I have two goals:
-
multiply(int8_a, int8_b) ->int8_c
====>saturate_multiply(int8_a, int8_b) -> int8_c
. Making multiply operation in model to saturate multiply -
multiply(int8_a, float32_b) ->int8_c
====>saturate_multiply(int8_a, float32_b) -> int8_c
Which step is most appropriate for rewriting multiply
in relay --> net.so
process.
Thanks.