thoughts on uniform expression evaluation in IEEE arithmetic

Sun Sep 12 20:21:37 PDT 1993

     Solaris 2.1 shipped for x86-based PC's in May, and Solaris 2.something
will ship for PowerPC-based PC's next year.    The original idea was that
Solaris 2 would be the SPARC operating system, but that idea appears to have
gotten lost in the train station.  Ed Zander, the president of SunSoft, used
to claim in his speeches that x86 Solaris would be bug-for-bug compatible with
SPARC, but if you looked in the right part of the fine print you'd find that
floating point was not included in that claim.

     The reason of course is that everybody does it differently, even among
IEEE 754 implementations.   The "extended" format varies: 1/15/64 bits in x86
hardware, 1/15/1/112 in SPARC software, and 1/11/1/52/1/11/1/52 in RS/6000
real*16 Fortran which is based on exploiting the fused multiply-add in real*8.
Of course Rs/6000 doubled-double is not an IEEE 754 format, but the fused
multiply-add allows it to be faster and better quality than it might have been
otherwise.

     Beyond format, expressions are evaluated differently by different com-
pilers and different releases of the same compilers and different optimization
levels of the same release.

     Finally, elementary transcendental functions and some library functions
are evaluated by diverse means, none of which produce correctly-rounded
results, and all of which produce different results.

     In contrast, correctly-rounded base conversion of adequate efficiency is
more or less a solved problem, by several people; one of these days I might
even get around to putting my work into an accessible form.

     Many of these divergences have their basis in reasonable concerns for
efficiency.   However most users, most of the time, would be happier if their
spreadsheets, for instance, delivered identical results on all platforms, and
would not notice the minor performance loss this would entail.  For instance,
I have had tax programs differ by one cent depending on whether they were run
on a real PC with extended precision in hardware or simulated, or on a PC
simulator on a SPARCstation which used the SPARC native double precision for
computations. Even most large scientific computations have only limited sec-
tions in which performance matters, and debugging the bulk of the code might
be simplified by having a way to obtain predictable numerical results.

     So this led me to consider how to define predictable arithmetic results
on IEEE-754 based Unix systems in C and Fortran.    I would hope that if a
consensus developed about the best way to provide predicatable results, then
compilers might provide that as a default compilation and library mode, even
while they continue to provide more highly-optimized modes and libraries.  So
here's my first stab at it.   If you think this is a worthwhile endeavor,
please contribute to the discussion.    The criterion should be what's most
useful to a scientific programmer, considering existing and future codes.

     EXCEPTIONS: I don't think it's worth specifying exceptional behavior in
this context except that it conform to the IEEE nonstop default requirements
or else terminate the computation on particular exceptions.   In other words,
tampering with expressions that invoke signal handlers that return into the
middle of things is too much to deal with.

     DATA TYPES:

C float           IEEE 754 single
C double          IEEE 754 double
C long double     illegal
Fortran real*4    IEEE 754 single
Fortran real*8    IEEE 754 double
Fortran real*16   IEEE 754 quad

C long double can't be mapped to any format in a useful way within this
regime, unfortunately. For instance, the x86 SVR4 ABI defines long double as
x86 extended, while SPARC SVR4 ABI defines it as IEEE quad (1/15/1/112), and
many compilers in other environments define double == long double.  The same
may be true of Fortran real*16, but there is less chance of it being imple-
mented with 10-byte x86 extended, so it seems possible that it could be
defined as IEEE 754 quad since nobody has a hardware implementation of that
yet.   RS/6000-style doubled-double could be another type with a different
name.

     As to integer formats, of course they affect floating-point expression
evaluation too, but whether int or INTEGER is 32 or 64 is less important since
that doesn't affect results unless an integer overflow exception occurs, which
I consider preferable to an interminable debate on 32 vs 64.

     EXPRESSION EVALUATION PRECISION: All expressions evaluated as if by the
"traditional Fortran rules": in the precision of the immediate operands, if
they are the same precision, or in the precision of the operand of the greater
precision.   I think this is closest to the intent of most existing working
Fortran programs, and probably would produce results closer to those of optim-
ized programs, than the alternative, which is to evaluate entire expressions
in "widest-need" format as discussed by Corbett years ago and NCEG more
recently.

     I think the practical implication for x86 systems is that they will run
in round-to-double-precision mode.   Newer x86 chips don't pay much penalty
for that mode, I have heard.

     EVALUATION ORDER: x+y+z is a problem.   Decisions must be made, such as:
where language definitions allow operators of equal precedence to be evaluated
in more than one way, the uniform-results method is to evaluate from left to
right.

     FORTRAN COMPLEX ARITHMETIC: There are some very complicated ways to get
complex IEEE arithmetic exactly right for conformal mapping.   I would guess
that it would be more beneficial in the current context to simply specify the
algorithms from libf2c, which understood in traditional C, evaluate single
complex in double precision and double complex in double precision.

     BASE CONVERSION: Correctly rounded of course.

     ELEMENTARY TRANSCENDENTAL FUNCTIONS: I don't think there is any substi-
tute for those who are able to cooperatively develop correctly-rounded elemen-
tary transcendental functions of adequate efficiency.    The general approach
of the people who've done it so far has been to have a reasonable fast subal-
gorithm that delivers enough of an error bound in order that it may be deter-
mined to be correctly rounded in the destination precision; on rare cases when
it's too close to call, a much slower and more accurate multi-precision subal-
gorithm is invoked.  The art lies in choosing the primary algorithm to prop-
erly balance speed with the incidence of close calls.

     OTHER ISSUES?  Whether or not you agree with the foregoing choices, are
there other issues which I have overlooked at the moment?    Remember that the
goal is to produce uniform numerical results (in the absence of exceptions
that trap and return) on x86 and common RISC IEEE 754 Unix systems in C and
Fortran, without penalizing performance unduly on any one common platform.