SPARC quad format

Tue Jun 4 11:20:33 PDT 1991

The early versions of the SPARC architecture defined the 128-bit format
to have 64 significant bits and 48 unused bits.  In 1987 I proposed
that the quad format be changed to 113 significant bits, and this was
eventually accepted for SPARC V8, possibly because nobody expected to
actually implement it in hardware in their lifetime.

My reasons included:

1) Progress in applying interval arithmetic methods reported at the
9/87 interval arithmetic conference in Columbus.  Conventional interval
methods benefit from lots of extra precision, while Kulisch-style
methods benefit from doubled-precision products.

2) A number of existing Cray customers cited lack of 128-bit format as
a reason for staying with Cray mainframes rather than moving to Unix
workstations.

3) Spectrum, I mean Precision, I mean PA-RISC, defined 128-bit quad and
I figured that HP had probably researched that issue as much as other
aspects of the architecture and concluded it was the appropriate future
direction.

4) Quad can be used as an extended precision for expression evaluation
too as long as you're careful that storage-precision variables
allocated to extended registers are rounded to storage format whenever
assignments to storage format occur in the program source.

5) On a 64-bit bus, anything larger than 64 bits will require two
accesses, so you might as well use all the bits.  Memory bandwidth is
probably the gating factor on floating-point performance in 
high-performance designs.

6) At least one hardware designer thought that although it would take a
little longer to get a quad precision adder implemented, it would not
take so much longer that it was worthwhile doing a klugey type of quad
precision such as composing one from two doubles.  Such arithmetic
can't be both fast and clean, especially if exceptions are to be
handled in a reasonable way.

7) I had heard rumors that IBM would implement "fused" (one rounding)
multiply-add some day, and the inherent greater generality and
flexibility of a full quadruple precision implementation would be
advantageous eventually.

8) To be useful for supporting double-precision computation, it's not
necessary to implement the full quad instruction set the first time.
The essential instructions are fdmulq (doubled-precision product),
faddq, fsubq, fdtoq, fqtod.  In particular fmulq, fdivq, and fsqrtq are
harder and can wait.

SunOS 4.1.1 supports kernel emulation of the SPARC V8 quad
instructions, although you wouldn't want to use it - too slow.  C 1.1
includes a long double type and Fortran 1.4 includes real*16 and
complex*32 which implement quad arithmetic in software without trapping
to the kernel.  If a surprise SPARC implementation with some quad
hardware came along (Sun doesn't know who all the SPARC implementers are)
it would be simple to rig up a .il inline
expansion template file that would allow these compilers to exploit
that hardware with good effect although somewhat less than would be
obtained if the compilers generated the instructions inline.  Anyway
SPARC V8 is being taken over by SPARC International and IEEE P1754.

Here's an example of using quad to support double.  The real part of a
complex product is computed as xr*yr-xi*yi.  If all computation is done
in double then rounding errors in the products can dominate the
significant figures of the result if cancellation occurs.   Furthermore
gratuitous overflow and underflow may affect intermediate results even
if the final result would lie in range.  Not to mention that SPECmarks
suffer if eight fpops are used instead of the minimal three.  Even so
that's a price we have paid with Sun Fortran (for complex*8) in order
to reduce the rate of roundoff accumulation.

A brute force way to deal with this is to promote all the operands to
extended or quad.  That means providing extended or quad
multiplication, which is apt to be fairly slow.  Far more economical is
to compute dprod(xr,yr)-dprod(xi,yi) which requires fdmulq and fsubq,
followed by fqtod.   That requires only four fpops and suffers one tiny
rounding error from the fsubq and a normal one from the fqtod.   So the
result is almost but not quite correctly rounded; to obtain correct
rounding would require only adding another operation fqsubd to obtain
the sum of two quad operands rounded to double precision with only one
rounding error, and with only three fpops too, same as the simplest
method, but with far better immunity against roundoff and exponent
spill.

IEEE 754 requires that operations round to no lower precision than the
more precise operand, and so fqsubd would seem to be a nonstandard
operation.  As I recall the reason for the 754 requirement was so that
exponent wrapping on trapped overflows and underflows would be
meaningful, but SPARC doesn't use exponent wrapping and I expect that
counting mode for overflows and underflows will never be worth its
cost.

In contrast conventional extended precision will still suffer roundoff
in computing xr*yr and xi*yi, which may be magnified by cancellation,
so the final rounded result can't be said to be nearly correctly
rounded.

Similarly fused multiply-add without extended or quad has to compute
	t = xr*yr 
	r = t - xi*yi 
and the benefit of one roundoff error
in the second step is lost in the noise of the normal roundoff error in
computing t.