more on float sizes

Fri Jun 7 10:45:29 PDT 1991

Tim Peters writes, while explaining -trunc:

<This proved to be an extremely effective tool in confirming, isolating
<and repairing suspected numeric instabilities.  If there's interest,
<I'll write more about that when I have some time.  Alas, while it was
<remarkably cheap to do this via software on the Cray architecture, the
<trend toward split register sets makes it a much less attractive
<proposition for general production use.  It's dirt cheap to do in
<hardware, though, "if only" there were sufficient market interest to get
<someone to bother (that does seem to be a continuing theme on this
<mailing list <grin/sigh -- where's Herman Rubin when you need 'im?!a>) ...

I have to agree, enthusiastically, about the value of -trunc.  We
discard any benchmark that displays numeric instability when truncating
5 bits.  There are quite a few, and many times the code authors
are unaware how sensitive the numbers they are producing really are
to the word size of the machine.  Count NAS in for "sufficient interest".

Tim also writes:

<So if 32 bits was already inadequate to run the problem sizes they were
<running, it was *way* inadequate to run the problem sizes they *wanted*
<to run.  At the time I left Cray (about 3 years ago), the problems were
<getting large enough and the machines were getting fast enough and the
<compilers were getting smart enough that a few more-or-less vanilla
<numerical programs were indeed pushing the limits of what could be done
<with Cray's 64-bit format.  All those "extra" bits get chewed up real
<fast in this framework, and of course Cray's famous arithmetic only
<makes it worse.

This is certainly a sensible and intuitive position to take, but
the evidence I've seen doesn't support this.  As one of the people
responsible for buying equipment that is supposed to be capable of
tackling "Grand Challenge" problems, it may not be surprising that
I'm really interested in what the word size requirements will be
in order that these problems get solved accurately.

I mentioned above that we see a lot of codes that output numbers
very sensitive to the word length the code was run on.  However,
in nearly every case, the numbers are the result of a sensitive
calculation, like computing residuals, for example, that are not
the primary values of interest, which might be pressures, velocities,
or displacements.  Almost without exception, whenever we have found 
an instability in the interesting quantities, it has turned out
that the model was bad, the grid was poor, or the numerical method was
inappropriate.  I'll repeat a request from my previous email
to this list: if any of the recipients of this list knows of an
exception, preferably one where I can examine the code, I'd *love*
to see it.

This is not to say that other fields, with different numerical
methods, aren't already in need of 128 bit floating point.  I
just don't think that the aerophysics part of the "Grand Challenge"
problems will require it.  Aerophysics problems solved at NAS
generally involve the solution of extremely large, very narrow
bandwidth sparse linear systems.  As the error analyses that
accompany the LAPACK algorithms show, these types of systems
can in general be solved very accurately, as long as the condition
numbers are moderate.  I don't have any evidence that the
condition numbers are anything other than moderate.

So in this age of buying computer systems by the pound, I much
prefer to keep my memory requirements modest (well if you can call
10 Gbytes or so modest) by keeping the hardware floating point 
computations at 64 bits, and improving my numerical techniques
if more accuracy is required.

Russell Carter
Numerical Aerodynamic Simulation, ARC
rcarteranas.nasa.gov