Satan loves extended precision

Patrice L. Roussel uunet!ichips.intel.com!proussel
Thu Jan 19 08:51:43 PST 1995


On Thu, 19 Jan 95 03:12:39 EST  uunet!f.gp.cs.cmu.edu!jonathan_r_shewchukauunet.uu.net wrote:
>
> On chips like the 486DX that have internal extended precision registers,
> is it possible to calculate the single or double precision result of a
> single operation by performing the operation in extended precision, then
> storing the result to memory?  Rounding a result to 80 bits, then to
> 53 bits can produce a different answer than rounding directly to 53 bits.

It is right. That is why you should set the PC to double in order to
guarantee that any result is returned on the stack with only 53 significant
bits. Then, you can store to memory safely (no need for extra bits ...).

> However, it is possible for a chip that stores 80-bit extended precision
> numbers internally to save a little extra state to ensure that when a
> number is stored to memory, it is rounded just as it would have if it had
> originally been rounded to 53 bits, thereby complying with the IEEE 754
> standard.  Does the 486DX do this?  If not, is there any way to correctly
> simulate 53-bit rounding - preferably using portable C code?  Does the
> Pentium (or other chips) have this flaw as well?
				^^^^^^^^^^^^^^^^^

This is not a flaw. The x86 FP architecture offers more than what IEEE
does. The reason why overflow/underflow thresholds do not depend on the
PC value (they are set to double extended), is to allow people to do
such things as calculating the length of a representable vector without
any overflow/underflow during the computation of the squares. Everybody
knows how complex the Berkeley expression for vector length is! With
a x87 product, you just write:

                     --  2
	|x| = sqrt ( >  x  )
	     	     --  i
		     i
 
For historical reason: Kahan was the consultant on the 8087 and 
except for the way the FP stack is managed, I have no memory of him
complaining about the point we are talking right now (and he is the 
father of the IEEE FP standard).

If you are ready to deal with lower performance and precision, the x87
permits you to be 100% IEEE compatible. Compilers have switches to force
this compatibility mode by setting the PC to double (or single) and 
forcing any arithmetic instruction to store the result to memory.

> Here's a brief explanation of why I'm concerned:  I have an application in
> which I need to calculate a sum, and then calculate the exact roundoff
> error associated with that sum.  Given an FPU with exact rounding (the
> default configuration for an IEEE 754 conformant FPU), a well-known
> procedure due to Dekker can produce the desired results:
> 
>     Assume |a| >= |b| (if not, swap a and b)
>     x := a + b
>     e := x - a
>     y := b - e
> 
> This simple algorithm is guaranteed to produce a y such that a+b = x+y
> _exactly_.  (In other words, y is the roundoff error associated with the
> double-precision floating-point statement "x := a + b".)  Furthermore,
> x and y are both double-precision values, and y is no greater than half
> an ulp of x.  I might try to make this procedure work on a 486DX in C by
> declaring x to be a "volatile" variable, so that the result of the addition
> is stored to memory and then reloaded before execution of the first
> subtraction.  However, the procedure can fail if the addition is computed
> by first rounding the result to 80 bits of precision, and then to 53 bits.
> 
> I wonder if it is possible to obtain the desired results on a 486DX,
> especially if I limit myself to C code that is portable across a wide
> variety of processors (which I suspect would remove the option of
> switching the 486DX into some sort of double-precision mode).

Two possibilities:

	- no overflow/underflow: just set PC to double
	- overflow/underflow: set PC to double and do not allow intermediate
	  result on the FP stack (meaning: flush them to memory as double)

	Patrice.



More information about the Numeric-interest mailing list