more on x**n

Wed May 8 10:05:08 PDT 1991

Peter Tang commented on my recent posting about x**n:

> Some time ago, I discovered that by exploiting the table-driven approach
> and other standard tricks, one can come up with a feasible implementation
> (that is, efficient) of x**y that is definitely below 1 ulp throughout
> the entire domain (that is as long as the final result is representable).
> K.C. took that algorithm and fine tuned it for SUN and ended up with a
> x**y function that is as fast as any (if not faster) that has been used
> before, but of course the new x**y is very accurate. (I believe the error
> and be made close to 0.55 ulp except for results that are denormalized.)
> The speed of this x**y is comparable to  
>      fastest exp   +   fastest log.
> 
> This new algorithm tightens the competition between repeated multiply for
> x**n  and   computing it by x**float(n):
> (1) While for small n repeated multiply would still be faster than 
>     x**float(n), it has two drawbacks
>     i)  As soon as the repeated multiply scheme takes more than one
>         basic operation (* or /), it is very likely that x**float(n)
>         is more accurate. Moreover, the error of repeated multiply
>         grows with n (or more precisely log_2(n)) while x**float(n)
>         exhibits uniform error behavior.
>     ii) Various tests have to be installed in order to avoid spurious
>         overflow in the repeated multiply method; whereas x**float(n)
>         is based on   exp( n log(x) ) and is consequently free from such
>         problems for "exp" typically has a final scaled by integer power
>         of 2.

Note that the algorithm for x**n that
I outlined avoids gratuitous overflow and underflow for positive n
- if you get an overflow or underflow on the last multiplication, it was
inevitable -
and uses "frexp" and "ldexp" to avoid gratuitous exponent spill for
negative n.

> (2) For large n, as already pointed out in your message, repeated multiply
>     can be slower, and quite definitely less accurate.
> (3) One feature of repeated multiply is that should the mathematical
>     result be an IEEE number, the algorithm will produce it. But since
>     the new x**y algorithm has error < 1 ulp, x**y will also give exact
>     result --- although the inexact flag will almost certainly be raised.
> (4) In the case of x**2, x**(-1); the simple operations x*x and 1.0/x give
>     correctly rounded result --- and is clearly better than x**float(2) or
>     x**float(-1). 
> (5) Finally, there may be some who would prefer to have x**n and x**float(n)
>     giving exactly the same result. In that case, the two may want to share
>     the same set of methods. For example, under certain criterion, both
>     would use repeated multiply; and under other criterion, both uses the
>     exp( log ) approach.