Parameter data-types are int16 instead of int8

Wheest · April 6, 2024, 8:57am

I am working with an int8 quantised model (super simple 3 layer TFLite one, gist here, using tensorflow==2.15, not 2.16 ). If you load the model in netron, you can see that the main (depth)conv2d weights are int8.

However, if I load the model in TVM (and export the weights using the debugger, so I can look at them in JSON), you can see that they are using a type which is too large.

For the weights of the depthwise conv (of shape 1x3x3x3), they are called p1, and are of type int16. Similarly, for the conv2d weight (of shape 1x1x3x64), (p8), these are also int16.

I’ve seen this for other models I’ve been working with, but this simple example is easier to discuss.

Is this intended behavior? Is there are a way to disable it?

I can imagine a justification that on some architectures using int16 is faster than int8. However, in my case memory is very constrained.

I’m loading my model into TVM with:

    input_dtype = "int8"
    relay_mod, params = relay.frontend.from_tflite(
        tflite_model,
        shape_dict={input_name: input_shape},
        dtype_dict={input_name: input_dtype},
    )