survey time: questions on 128-bit floating-point arithmetic

Wed Jul 19 20:13:25 PDT 1995

It's time for another survey of persons interested in 128-bit quadruple-
precision IEEE floating-point arithmetic.   I will summarize any responses
I get to the responders.    To refresh your memory of some of the issues,
I'll follow this with some of the discussion from last time around in 1991.

I'm talking about the 128-bit floating-point format defined in SPARC and PARISC
that has 113 significant bits of precision, one implicit, and 15 exponent
bits.   I don't know if MIPS or ALPHA or POWERPC have defined anything similar,
although RS/6000 compilers support a hacked non-IEEE quad format composed of
two doubles. I don't know what lies in store from Intel in P6, P7, or P8.

The SPARC and PARISC quad formats have been defined on paper for years but
not implemented in any currently available silicon.    
If they were, they would be much faster than
the software emulation that currently exists at least for SPARC, but not as
fast as the double-precision floating-point hardware, for a variety of reasons,
including more memory bandwidth requirements and longer carry propagate times.
Let's be conservative and suppose that in some future generation, quad
fp hardware could be available that ran 10X slower than corresponding 
double fp hardware on the same chip.

Examples of general uses of quad hardware include interval arithmetic -
quad interval arithmetic would defer roundoff contamination for a long time -
and fast unstable computations, or just dumb algorithms, that can reduce
roundoff concerns by doing everything in quad more cheaply than by hiring
a numerical analyst to figure out how to get the right answers without quad.

Questions:

1) What applications do you currently use that depend on 128-bit floating-
point arithmetic, and where do you run them?   
That might be software long double or REAL*16 on SPARC,
or corresponding types on RS/6000, VAX H format, CDC/Cray double precision,
or IBM 360/370 extended.

2) Could you do better science/engineering in your current applications
if quad ran only 10X slower than double?   
For instance, by allowing simulations to run longer or with finer grids.

3) What new applications would become feasible to build with such quad hardware?
How widely would they be used at your site/department/organization?

4) Is the roundoff level of double precision a problem for your current
applications?

5) A principle of RISC architecture is that every feature costs something by
preventing implementation of other features.   A principle of the current
computer marketplace is that every feature tends to increase time to market
and thereby lowers relative performance at introduction.
Quad hardware uses up gates that might otherwise go to another onboard
processor, a more complicated superscalar dispatch scheme, 
speculative execution, graphics/video/signal processing primitives,
or a larger onboard cache, to name just a few possibilities.    Would the
availability of quad arithmetic make enough difference in a workstation
purchase decision to outweigh some of these other possibilities, or is it
a poor usage of gates for your applications?