After the last changes in the quantization in the following commit, I am facing some issues:
I get the following error:
File "/home/tvm/tvm/python/tvm/relay/quantize/_partition.py", line 136, in add_partition_function
if 'cuda' in _target.current_target().keys:
AttributeError: 'NoneType' object has no attribute 'keys'
If I remove the problematic code in 1, the accuracy of a quantized model, which was previously working, drops to the point that the output of the model is not valid anymore.
@vinx13 can you have a look into this? I saw that the mentioned commit touches some aspects of accuracy.
diff --git a/python/tvm/relay/quantize/_partition.py b/python/tvm/relay/quantize/_partition.py
index 1180d836..d794b4e3 100644
--- a/python/tvm/relay/quantize/_partition.py
+++ b/python/tvm/relay/quantize/_partition.py
@@ -85,8 +85,8 @@ def add_partition_generic(ref_call, new_args, ctx):
# %10 = add(%9, %meta[relay.Constant])
# %11 = add(%3, %10) <- need to insert annotations for %3, %10
# ...
- lhs = new_args[0].realize()
- rhs = new_args[1].realize()
+ #lhs = new_args[0].realize()
+ #rhs = new_args[1].realize()
return _forward_op(ref_call, [lhs, rhs])
elif not lhs_cond and rhs_cond:
# - introduced by residual connection in ResNet
@@ -102,7 +102,7 @@ def add_partition_generic(ref_call, new_args, ctx):
# %25 = add(%24, %meta[relay.Constant])
# %26 = add(%18, %25) <- need to insert annotations for %25
# ...
- rhs = new_args[1].realize()
+ #rhs = new_args[1].realize()
return _forward_op(ref_call, [lhs, rhs])
elif lhs_cond and not rhs_cond:
if _analysis.check_constant(rhs):
diff --git a/python/tvm/relay/quantize/quantize.py b/python/tvm/relay/quantize/quantize.py
index adde2058..ca267ec3 100644
--- a/python/tvm/relay/quantize/quantize.py
+++ b/python/tvm/relay/quantize/quantize.py
@@ -401,8 +401,8 @@ def prerequisite_optimize(graph, params=None):
graph = _bind_params(graph, params)
mod = _module.Module.from_expr(graph)
- with _transform.PassContext(opt_level=3):
- mod = optimize(mod)
+ #with _transform.PassContext(opt_level=3):
+ mod = optimize(mod)
return mod["main"]
This is a temporary fix for the issue.
After applying this patch, accuracy of resent18/resnet50 v1/v2 are normal.
Summary of the issues:
Because of realize call in the partition pass, some of the additions in the quantized model will be int8 addition (instead of casting from int8 to int32 before addition). In this case, if the scale is not carefully chosen, overflow is likely to happen.
prerequisite_optimize uses opt_level=3. In my past experiments I observed some accuracy issue because of FoldScaleAxis. This pass is also applied to the model that is used for calibration. As a result, the collected output from the calibration set might contain more outlier. Simply taking maximum of the output as the scale might not work in this case. (Instead we may want to remove the outlier such as using 99% maximum)