Support for pre-quantized model int8/uint8 conversion

JoeyChou · October 1, 2020, 9:37pm

Hi,

Does QNN support int8 --> uint8 or uint8 --> int8 pre-quantized model conversion? If no, is there a plan to support it?

Tag @anijain2305 cause you are fantastic! Thank you!

anijain2305 · October 2, 2020, 7:25am

Hi @JoeyChou I am not sure what you mean by int8 -> uint8 conversion.

If you want your conv2d and dense inputs and weights to be of specific data type, yes that is certainly possible with QNN Legalize pass. An example of this is for Intel VNNI instructions which prefer uint8 datatypes for feature maps and int8 for the weights. Naturally, the pre-quantized models might not follow this rule. So, QNNLegalize inserts requantize node before the conv2d and dense to satisfy the datatype restrictions.

Please look at an example here

github.com

apache/incubator-tvm/blob/master/python/tvm/relay/qnn/op/legalizations.py#L296-L300


def _qnn_conv2d_legalize_intel_cpu(attrs, inputs, types):
    # The VNNI transformations prefer uint8 x int8 datatypes.
    if is_fast_int8_on_intel():
        return helper_change_dtypes_to_uint8_int8(attrs, inputs, types, relay.qnn.op.conv2d)
    return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.nn.conv2d)

github.com

apache/incubator-tvm/blob/master/python/tvm/relay/qnn/transform.py#L71-L116


def Legalize():
    """Legalizes QNN ops. As opposed to Relay Legalize, this one legalizes only QNN ops. One can
    register a transformation/legalization function for an op by using the FTVMQnnLegalize attr_name
    for FTVMLegalize op attribute. The isolation of QNN and Relay Legalize gives us separation of
    concerns, leading to a better software practice. The legalization can be configured to happen
    per target. An example of this type of legalization is shown below.

    Examples
    ________

    Suppose the original graph is as follows

            data(u8)  weight(u8)
                |       |
                |       |
               qnn.conv2d (int32)
                   |
                   |
                nn.relu (int32)

This file has been truncated. show original

github.com

apache/incubator-tvm/blob/master/tests/python/relay/test_pass_qnn_legalize.py#L295


    ###########################################
    # Check transformations for CUDA platforms.
    ###########################################
    with tvm.target.Target("cuda"):
        legalized_mod = relay.qnn.transform.Legalize()(mod)
        assert "cast" in legalized_mod.astext() and "qnn" in legalized_mod.astext()


if __name__ == "__main__":
    test_qnn_legalize()
    test_qnn_legalize_qnn_conv2d()
    test_qnn_legalize_qnn_dense()

JoeyChou · October 9, 2020, 4:02pm

Hi @anijain2305 thanks for the reply. I should’ve made myself clear. What I meant was if the model(weight and bias) was quantized to uint8, does TVM has a way to convert the uint8 weight and bias to int8 weight and bias?

I will certainly try what you suggested, thank you.

anijain2305 · October 9, 2020, 4:20pm

Yes, it does. The legalize pass can do this.

JoeyChou · October 9, 2020, 4:21pm

Yes, really appreciate your help!