Agreed, but it would be nice to agree a ‘TVM’ standard way to represent quantization at various levels. That way others (maybe even me ) can start applying that standard to the frontends. It would also make it easier to get accelerators to work with TVM’s auto-quantization.
We need to be a bit careful doing this, because it’s the frontend behaviour that’s different rather than the backend. So any such differing in canonicalization should correspond to an attribute on qnn.requantize
that determines the desired numerical behaviour.
I think this is one of the major complications for quantized accelerators - we require that everything is in the affine space. That is to say, we don’t have an ‘integer’ accelerator but specifically a quantized one. So even things like sigmoid which seems strange to run in the affine space at least need to be ‘fake quantized’ with dequantize/quantize so we have access to those affine parameters. TFLite does this relatively neatly by having the quantization parameters as part of the tensor type but unfortunately we can’t do the same thing in TVM.
I’m also hoping for this If we can arrive at a durable and flexible representation for auto-quantization I think it would even be beneficial to see if we can rewrite parts of the TFLite frontend to conform to that.