Cray, guard bits, and double precision

Fri Oct 12 14:54:11 PDT 1990

> Date: Fri, 12 Oct 90 14:08:52 PDT
> From: sun!Eng!khb (Keith Bierman - SPD Advanced Languages)
> Subject: kahan's posix 10003 submital
>
> Have you seen it ? It is an essay on cray arithmetic...
>
> His basic argument is that cray should put in a guard digit, and do
> minor other fixup work rather than do all ieee754 just now.
>
> There is one slide, where he claims that by placing a guard bit into
> the cray, that sw "double" precision can be sped up by a factor of 3x
> over its current cost of 25x-80x hw math ... should this be obvious ?
> Is this something which we could exploit to speed up our quad ?

IEEE arithmetic (and VAX, and IBM 370) already has a guard bit.
Anything designed by Seymour from Univac on, I think, does not.

The issue is whether you can
compute the exact difference between two numbers that have the same sign
and are within a factor of two of each other.

If you don't have a guard digit then the difference

	1.00000
 -      0.fffff8

will come out to be 0.00001 instead of 0.000008, which is not only inexact
but off by a factor of two.

All doubled-precision arithmetic built upon single-precision floating-point
arithmetic depends on this trick to decompose floating-point sums into
pieces that can be readily handled.  The steps are something like:

	Given single precision X > Y > 0, compute single precision x and y
	such that (x+y) == (X+Y) exactly and the exponents of x and y differ
	by at least the significance of the single precision.

	x = X + Y			rounded
	y = x - X			should be exact because 0.5 <= x/X <= 2 ! 			
If you can't count on that critical
subtraction, you have to do something else much less efficient involving
a mysterious constant.  This was discussed in Kahan's 1972 lecture notes.

Anyway, our current SPARC V8 quadruple-precision implementation uses integer 
arithmetic
in order that all the boundary and exception cases can be handled easily.
That would be very hard to do faster with floating-point arithmetic because the
quad exponent field is larger than the double exponent field.  I won't
worry too much about performance until we get some complete hardware
implementations of V8.

Cray doubled-precision arithmetic's goals are more modest; the exponent
field is the same size, so an efficient doubled-precision arithmetic could
be built if the single-precision arithmetic were clean enough.
That's Kahan's point.  Whether it would be 3X faster than what's there now
I couldn't guess.

This fast doubled-precision would be ragged around the edges but no more so
than usual for Cray.