Importing quantized PyTorch model and problems with quint and qint datatypes

Hello,

I have a model that I quantized in eager mode in PyTorch. I have an implementation that doesn’t use the QuantStub and QuantDestub layers, as I want my model to directly accept inputs of dtype int8/uint8 without the quantization steps. The problem I am facing is that the model now expects inputs of type qint8 or quint8, but that causes issues when importing. Similar problem with the output: dtype is qint8 or quint8, but TVM can’t work with them. Is there a way to deal with this?

You may have to be more specific. Where is the error occurring?

I have been dealing with an issue that sounds similar. The short answer is that TVM doesn’t support quint8 input (errors out in convert_data_type call in PyTorch frontend of TVM’s _get_relay_input_vars). You can either change your PyTorch model definition to use float input/output or modify the relevant function in the frontend to “unroll” quint8 inputs to an uint8 tensor with a zero point and scale attached. This introduces a difference in model definitions from PyTorch to Relay (Relay representation of your graph would have uint8 input/output).

It is indeed the same error! It also occurs in the _get_relay_input_vars functions.

Can you elaborate what you mean with “unrolling quint8”? I am targeting an embedded device, so I would prefer to get rid of float numbers, if that is at all possible. I know that PyTorch has the quantized_tensor.int_repr() function, but I think that also cause some issues when importing the model.

quint8 is an opaque PyTorch type, we cannot ingest such inputs.