[RFC] Using arm intrinsics to implement fixed point multiplication in TVM

Let’s say that fmpq(x,y,n) is defined as a fixed point multiplication of two Qk.n numbers x and y. Assume that k+m is 32, so that both x and y can be represented as int32 values.

Then, the fpm(x,m,s) that you want to implement is fmpq(2*x,m,31) * 2^s. As a matter of fact, the first operand in this multiplication is exactly sqrdmulh(x,m).

You can then invent the new topi operator, call it “fixed_point_multiply_and_scale” (or some better name), and implement it using the fmpq intrinsic with the scaling by 2^s.

Finally, you can “realize” that the original goal of multiplying an integer by a normalized floating point value is equivalent to the “fixed_point_multiply_and_scale”. This solves the original problem, and introduces a general TIR intrinsic for Q-number multiplication.