Hello @leandron and @lhutton1. I am currently adding tests for the Softmax operator of the Paddle model targeting CMSIS-NN. During several days of testing, I found that not all softmax operators can be offloaded to CMSIS-NN. The specific reasons are as follows:
cmsisnn.py
def check_qnn_softmax(pattern):
"""Check if softmax is supported by CMSIS-NN."""
dequantize_call = pattern.args[0].args[0]
scale = pattern.args[1].data.numpy().item(0)
zero_point = pattern.args[2].data.numpy().item(0)
# check for dtypes of quantize and dequantize
if (
(scale == 1.0 / 256 and zero_point == -128)
and pattern.attrs.out_dtype == "int8"
and dequantize_call.args[0].checked_type.dtype == "int8"
):
return True
if (
(scale == 1.0 / 32768 and zero_point == 0)
and pattern.attrs.out_dtype == "int16"
and dequantize_call.args[0].checked_type.dtype == "int16"
):
return True
return False
I found that in the case of int8 quantization, the scale and zero point of the softmax operator must be specified as 1/256 and -128 respectively. As far as I know, whether it’s Torch or Paddle models, the scale and zero point should be user-defined. I don’t understand why there is a mandatory restriction on the scale and zero point.