It makes sense now, thanks a lot @kparzysz !
fpmq
would be my intrinsic, and it does the fixed point multiplication:
def fixed_point_multiply(x, y, n)
x = cast(x,int64) * y
pos_rounding_value = 1 << (n -1)
x = x + pos_rounding_value
x = x >> n
return cast(x, int32)
Which I call from the TOPI operator that I can overload for the arm target and use arm intrinsics.
However, I am sligthly worried about performance. Because in the default non-arm case, I would do two shifts (by n
and by s
), instead of combining everything into a single shift (n+s
) - which is called total_right_shift
in the original code.
What do you think?