tico
July 3, 2019, 2:40pm
1
Hi,
I was wondering if someone could clarify the meaning and impact of the different quantization parameters, in particular store_lowbit_output
and global_scale
?. Also which are valid values for each parameter?. Unfortunately, this is not properly documented, or at least I have not been able to find this.
_node_defaults = {
"nbit_input": 8,
"nbit_weight": 8,
"nbit_activation": 32,
"dtype_input": "int8",
"dtype_weight": "int8",
"dtype_activation": "int32",
"global_scale": 8.0,
"skip_conv_layers": [0],
"round_for_shift": True,
"store_lowbit_output": True,
"debug_enabled_ops": None,
}
Thanks
tico
July 4, 2019, 4:22am
2
@vinx13 Could you please clarify me this? Or if there any documentation about it?
return super(QConfig, self).__setattr__(name, value)
def current_qconfig():
"""Get the current quantization configuration."""
return _quantize._GetCurrentQConfig()
# TODO(tmoreau89, ZihengJiang) the skip parameters are
# hacky - we should explore a more future-proof way to
# skip operators based on pattern matching
def qconfig(**kwargs):
"""Configure the quantization behavior by setting config variables.
Parameters
---------
nbit_dict: dict of QAnnotateKind -> int
Number of bit for every kind of annotate field.
global_scale: float
The global scale for calibration.
Global scale is an alternative to dom scale obtained via calibration
tico
July 4, 2019, 8:04am
4
Thanks! I missed those comment in that file. Still for me is not clear the difference between dtype_input
and nbit_input
and the same for weights and activations. In other words, what is the difference between the number of bits and the data type in each case.
tico
July 4, 2019, 11:09am
5
When I set store_lowbit_output
to false the first conv2d in my model is not quantized. Could you please tell me why is that in relation to store_lowbit_output
?
vinx13
July 4, 2019, 11:42am
6
First conv2d layer is by default not quantized, there is a option skip_conv_layers
(or skip_k_conv
in older version), it has nothing to do with store_lowbit_output
vinx13
July 4, 2019, 11:45am
7
In current implementation they are the same. But it offers possibility to use other dtypes such as uint8
tico
July 4, 2019, 12:02pm
8
Are the nbit_*
arguments the maximum number of bits used or a fixed number of bits used for all weights, inputs, activations?
Or, Does the calibration step decides the number of bits in each case? If so how can I observe the number of bits selected after the realization step?
vinx13
July 4, 2019, 12:04pm
9
It a fixed number of bits. Calibration uses it to decide the maximum of a data type
tico
July 4, 2019, 12:07pm
10
This is very strange because only with store_lowbit_output=False
I see the first Conv2d not quantized:
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d', (..., 'float32'), (..., 'float32')
With store_lowbit_output=True
I see the following:
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d', (..., 'int8'), (..., 'int8')
Note that in both cases I used skip_k_conv=None
tico
July 4, 2019, 12:09pm
11
Is there any way to visualize the number of bits finally selected by calibration?
vinx13
July 4, 2019, 12:10pm
12
#bits is a fixed number in your config.
Can you print the quantized result and confirm whether first layer is quantized?
net = relay.quantize(…)
print(net)
tico
July 4, 2019, 12:16pm
13
No is not being quantized:
%2 = nn.conv2d(%1, meta[relay.Constant][0] /* ty=Tensor[(....), float32] ......
vinx13
July 4, 2019, 12:36pm
14
There might be something wrong with annotation. Can you check the result of annotate
pass?
tico
July 4, 2019, 12:52pm
15
What is the best way to do that?
vinx13
July 4, 2019, 12:54pm
16
tico
July 4, 2019, 1:46pm
17
I see in https://github.com/dmlc/tvm/blob/b3f3ab5593c1949947c9872c8df1479975116a95/python/tvm/relay/quantize/quantize.py#L360 that skip_k_conv=[0]
although I set it to None
, so is not reading this argument corrently and instead using the default one.
I also try to set the skip to other Conv2D layer but it does not work.
tico
July 4, 2019, 1:51pm
18
I found the error. I was using skip_k_conv
instead of skip_conv_layers
. I guess the name changed in the mean time.
tico
July 4, 2019, 2:14pm
19
Another question: What is then the purpose of calibration?
vinx13
July 4, 2019, 2:47pm
20
it calculates dom_scale and other params for each simulated_quantize node