[Quantization] How to quantize transpose and nn.pad operators?

tico · August 28, 2019, 11:07am

Hi,

I am trying to quantize a model which is originally in NHWC, so in order to be able to quantize it I set the target data layout to NCHW. However, as discussed in other threads, change in the data layout implies that transpose operators are added. The problem is that the transpose and also nn.pad operators are added in between the chain of convolutions and since the transpose operator is not quantized in TVM, there are many casting operators from float to int along the chain of convolutions.

What can be done to fix this behavior? How difficult would be to quantize transpose operator?

@vinx13 @ziheng Could you please give me a hint here?

Thanks

vinx13 · August 29, 2019, 5:44pm

transpose is easy as it only need identity rewrite
see

github.com

dmlc/tvm/blob/master/python/tvm/relay/quantize/_partition.py#L67


ret = _forward_op(ref_call, [data, kernel])
return QPartitionExpr(ret)




def identity_partition_function(ref_call, new_args, ctx):
cond, expr = partition_expr_check(new_args[0])
if cond:
    return QPartitionExpr(_forward_op(ref_call, [expr]))
return None


register_partition_function("clip", identity_partition_function)
register_partition_function("nn.relu", identity_partition_function)
register_partition_function("nn.max_pool2d", identity_partition_function)




def add_partition_generic(ref_call, new_args, ctx):
"""Rewrite function for ewise add for partition for generic devices"""
lhs_cond, lhs = partition_expr_check(new_args[0])
rhs_cond, rhs = partition_expr_check(new_args[1])
if lhs_cond and rhs_cond:
    # - introduced by ResNet, when for the first residual connection

github.com

dmlc/tvm/blob/master/python/tvm/relay/quantize/_annotate.py#L281


    return None


x_expr, x_kind = _get_expr_kind(new_args[0])
if x_kind is None:
    return None


ret_expr = _forward_op(ref_call, [x_expr])
return QAnnotateExpr(ret_expr, x_kind)




register_annotate_function("clip", identity_rewrite)
register_annotate_function("nn.relu", identity_rewrite)
register_annotate_function("strided_slice", identity_rewrite)
register_annotate_function("nn.avg_pool2d", identity_rewrite)
register_annotate_function("annotation.stop_fusion", identity_rewrite)




def pool2d_rewrite(ref_call, new_args, ctx):
"""Rewrite function for max pool2d"""
if quantize_context().check_to_skip(ref_call):
    return None

github.com

dmlc/tvm/blob/master/src/relay/pass/quantize/realize.cc#L455


  Expr ret = ForwardOp(ref_call, {n->data});
  return QRealizeIntExprNode::make(ret, n->dom_scale, n->dtype);
}
CHECK(!new_args[0]->derived_from<TempExprNode>());
return Expr(nullptr);
}


RELAY_REGISTER_OP("nn.relu")
.set_attr<FForwardRewrite>("FQRealizeRewrite", IdentityRealize);


RELAY_REGISTER_OP("strided_slice")
.set_attr<FForwardRewrite>("FQRealizeRewrite", IdentityRealize);


RELAY_REGISTER_OP("annotation.stop_fusion")
.set_attr<FForwardRewrite>("FQRealizeRewrite", IdentityRealize);


/* \brief for unary operators which requantize its input to dtype_nbit */
Expr CastDtypeInputRealize(const Call& ref_call,
                         const Array<Expr>& new_args,
                         const NodeRef& ctx) {
const QConfig& cfg = QConfig::Current();

pad need a custom rule in realize to pad with quantized-type (i.e. int) value

tico · September 2, 2019, 7:07am

Hi @vinx13, thanks for the pointers!

I have a couple of further questions:

Could you please give further hints on the pad operator regarding the custom rule?
What about quantizing dense layers? I think I saw some code regarding this. Is this already supported?

Thanks

tico · September 2, 2019, 11:42am

BTW, Can reshape be also implemented with identity realize?

vinx13 · September 2, 2019, 8:21pm

For pad, you need to implement PadRealize
Quantizing dense is supported
Reshape can be implemented with identity realize