wide evaluation / FPCE

Fri Sep 29 13:10:05 PDT 1995

wide evaluation / FPCE                                   9/29/95      11:28 AM

This note is about how the "Floating-Point C Extensions" (FPCE) part of the
ANSI C committee's Technical Report on Numerical C Extensions addresses the
abuse-of-wide-precision issue recently under discussion.

Jonathan Thornburg:
>A number of recent numeric-interest messages have discussed how
>compilers should handle the 80x86 extended precision stack, and in
>particular if/when/how a compiler should round extended precision
>values to normal storage precision (typically double).
>
>It seems to me that underlying this discussion is a somewhat more
>general question: if/when/how a compiler should sacrifice performance
>to achieve bit-for-bit IEEE compliance?

As an aside, this isn't primarily an IEEE compliance issue -- non-IEEE systems
have wide registers too.  Sacrificing predictability for performance is a
tradeoff which some language implementators have made since long before the
IEEE standard.  

>Vaughan Pratt <prattacs.Stanford.EDU> gave an nice test problem for
>this discussion a month or so back:
>
>|                 double w, x, y, z;
>|                 ...
>|                 x = y + z;
>|                 ...(no further assignment to x)
>|                 w = x - y;
>| 
>| One cannot deduce that the last line uses x rounded to double, since if
>| y + z is still in a register (and hence at extended precision) when the
>| last line is reached, gcc for the x86 is happy to treat x as synonymous
>| with y+z (very naughty), which makes w exactly 0 in this case.  Whether
v| y+z *is* still in a register can depend (at compile time) on code
>| *following* the last line, with the result that whether w is zero can
>| vary as you vary the following code (but not the values in y and z)!
>
>As Michael Meissner <meissneracygnus.com> pointed out, on 80x86 machines
>there's a nontrivial performance penalty for rounding x to double here:
>
...
>| stack to memory and losing precision.  It is certainly doable to store
>| and restore the numbers in 80 bit format when spilling registers,
>| which at least would prevent much of the nondeterminism.  The problem
>| is that the extended store/load instructions take more cycles.

FPCE's goal is to allow the flexibility for efficiency on all major
architectures, and still guarantee predictability when reasonably expected. 
An FPCE implementation identifies the precision it uses for evaluating
floating-point operations of a given type.  C implementations for an
extended-based architecture such as the x86 can, as many do, choose extended
precision for operations of all floating types.  FPCE reinforces the C
requirement that assignments and casts honor the storage type.  In the
example, evaluating to extended precision is OK under FPCE and C, but using
the extend result of  y + z  for  x  in calculating  w  is not OK.  From
knowledge of the compiler's evaluation precisions and of the underlying
arithmetic, results of basic arithmetic expressions are predictable (even
though not uniform across all implementations).  

If amenable to wider precision, the programmer can declare local variables
with FPCE's  double_t  typedef (there's a  float_t  too), e.g.

	double_t x;

An implementation evaluating everything to extended would define  double_t  to
be  long double , i.e. extended, and hence would not narrow  x .  (An
implementation that evaluated double operations to double precision would
define  double_t  to be  double , so the program could be efficient there
too.) 

An implementation may be able to exploit architectural aspects in order to
gain additional speed at the cost of predictability, e.g. by spilling extended
registers to double, but I agree with David Hough that these are best left as
implementation-specific optimization options.

-Jim