Satan loves extended precision

uunet!F.GP.CS.CMU.EDU!Jonathan_R_Shewchuk uunet!F.GP.CS.CMU.EDU!Jonathan_R_Shewchuk
Thu Jan 19 00:12:39 PST 1995


On chips like the 486DX that have internal extended precision registers,
is it possible to calculate the single or double precision result of a
single operation by performing the operation in extended precision, then
storing the result to memory?  Rounding a result to 80 bits, then to
53 bits can produce a different answer than rounding directly to 53 bits.
However, it is possible for a chip that stores 80-bit extended precision
numbers internally to save a little extra state to ensure that when a
number is stored to memory, it is rounded just as it would have if it had
originally been rounded to 53 bits, thereby complying with the IEEE 754
standard.  Does the 486DX do this?  If not, is there any way to correctly
simulate 53-bit rounding - preferably using portable C code?  Does the
Pentium (or other chips) have this flaw as well?

Here's a brief explanation of why I'm concerned:  I have an application in
which I need to calculate a sum, and then calculate the exact roundoff
error associated with that sum.  Given an FPU with exact rounding (the
default configuration for an IEEE 754 conformant FPU), a well-known
procedure due to Dekker can produce the desired results:

    Assume |a| >= |b| (if not, swap a and b)
    x := a + b
    e := x - a
    y := b - e

This simple algorithm is guaranteed to produce a y such that a+b = x+y
_exactly_.  (In other words, y is the roundoff error associated with the
double-precision floating-point statement "x := a + b".)  Furthermore,
x and y are both double-precision values, and y is no greater than half
an ulp of x.  I might try to make this procedure work on a 486DX in C by
declaring x to be a "volatile" variable, so that the result of the addition
is stored to memory and then reloaded before execution of the first
subtraction.  However, the procedure can fail if the addition is computed
by first rounding the result to 80 bits of precision, and then to 53 bits.

I wonder if it is possible to obtain the desired results on a 486DX,
especially if I limit myself to C code that is portable across a wide
variety of processors (which I suspect would remove the option of
switching the 486DX into some sort of double-precision mode).

Any information would be greatly appreciated,
Jonathan Shewchuk
School of Computer Science
Carnegie Mellon University
jrsacs.cmu.edu



More information about the Numeric-interest mailing list