Quantization Configuration Documentation?

tico · July 3, 2019, 2:40pm

Hi,

I was wondering if someone could clarify the meaning and impact of the different quantization parameters, in particular store_lowbit_output and global_scale?. Also which are valid values for each parameter?. Unfortunately, this is not properly documented, or at least I have not been able to find this.

   _node_defaults = {
        "nbit_input": 8,
        "nbit_weight": 8,
        "nbit_activation": 32,
        "dtype_input": "int8",
        "dtype_weight": "int8",
        "dtype_activation": "int32",
        "global_scale": 8.0,
        "skip_conv_layers": [0],
        "round_for_shift": True,
        "store_lowbit_output": True,
        "debug_enabled_ops": None,
}

Thanks

tico · July 4, 2019, 4:22am

@vinx13 Could you please clarify me this? Or if there any documentation about it?

vinx13 · July 4, 2019, 6:21am

github.com

dmlc/tvm/blob/b3f3ab5593c1949947c9872c8df1479975116a95/python/tvm/relay/quantize/quantize.py#L130


    return super(QConfig, self).__setattr__(name, value)




def current_qconfig():
"""Get the current quantization configuration."""
return _quantize._GetCurrentQConfig()


# TODO(tmoreau89, ZihengJiang) the skip parameters are
# hacky - we should explore a more future-proof way to
# skip operators based on pattern matching
def qconfig(**kwargs):
"""Configure the quantization behavior by setting config variables.


Parameters
---------
nbit_dict: dict of QAnnotateKind -> int
    Number of bit for every kind of annotate field.


global_scale: float
    The global scale for calibration.

Global scale is an alternative to dom scale obtained via calibration

tico · July 4, 2019, 8:04am

Thanks! I missed those comment in that file. Still for me is not clear the difference between dtype_input and nbit_input and the same for weights and activations. In other words, what is the difference between the number of bits and the data type in each case.

tico · July 4, 2019, 11:09am

When I set store_lowbit_output to false the first conv2d in my model is not quantized. Could you please tell me why is that in relation to store_lowbit_output?

vinx13 · July 4, 2019, 11:42am

First conv2d layer is by default not quantized, there is a option skip_conv_layers (or skip_k_conv in older version), it has nothing to do with store_lowbit_output

vinx13 · July 4, 2019, 11:45am

In current implementation they are the same. But it offers possibility to use other dtypes such as uint8

tico · July 4, 2019, 12:02pm

Are the nbit_* arguments the maximum number of bits used or a fixed number of bits used for all weights, inputs, activations?

Or, Does the calibration step decides the number of bits in each case? If so how can I observe the number of bits selected after the realization step?

vinx13 · July 4, 2019, 12:04pm

It a fixed number of bits. Calibration uses it to decide the maximum of a data type

tico · July 4, 2019, 12:07pm

This is very strange because only with store_lowbit_output=False I see the first Conv2d not quantized:

WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d', (..., 'float32'), (..., 'float32')

With store_lowbit_output=True I see the following:

WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d', (..., 'int8'), (..., 'int8')

Note that in both cases I used skip_k_conv=None

tico · July 4, 2019, 12:09pm

Is there any way to visualize the number of bits finally selected by calibration?

vinx13 · July 4, 2019, 12:10pm

#bits is a fixed number in your config.

Can you print the quantized result and confirm whether first layer is quantized?
net = relay.quantize(…)
print(net)

tico · July 4, 2019, 12:16pm

No is not being quantized:

%2 = nn.conv2d(%1, meta[relay.Constant][0] /* ty=Tensor[(....), float32] ......

vinx13 · July 4, 2019, 12:36pm

There might be something wrong with annotation. Can you check the result of annotate pass?

tico · July 4, 2019, 12:52pm

What is the best way to do that?

vinx13 · July 4, 2019, 12:54pm

dump result of Annotate in https://github.com/dmlc/tvm/blob/b3f3ab5593c1949947c9872c8df1479975116a95/python/tvm/relay/quantize/quantize.py#L360
and check whether conv2d_rewrite https://github.com/dmlc/tvm/blob/b3f3ab5593c1949947c9872c8df1479975116a95/python/tvm/relay/quantize/_annotate.py#L164
works for the first layer

tico · July 4, 2019, 1:46pm

I see in https://github.com/dmlc/tvm/blob/b3f3ab5593c1949947c9872c8df1479975116a95/python/tvm/relay/quantize/quantize.py#L360 that skip_k_conv=[0] although I set it to None, so is not reading this argument corrently and instead using the default one.

I also try to set the skip to other Conv2D layer but it does not work.

tico · July 4, 2019, 1:51pm

I found the error. I was using skip_k_conv instead of skip_conv_layers. I guess the name changed in the mean time.

tico · July 4, 2019, 2:14pm

Another question: What is then the purpose of calibration?

vinx13 · July 4, 2019, 2:47pm

it calculates dom_scale and other params for each simulated_quantize node