Nextafter

Thu Apr 25 08:21:53 PDT 1991

Unfortunately, nextafter alone is not enough to capture all properties of
arithmetic needed for writing robust numerical software. Here is a brief
summary of what we learned in the LAPACK project about this.

We began by using what we called SAFEMIN instead of MINFLOAT, the
idea being the SAFEMIN was the smallest number we could safely reciprocate.
This may differ from MINFLOAT both because of nonsymmetry in the exponent
range and funny floating point arithmetics that deal incorrectly with
very large or very tiny numbers. For example, on the Cray we 
(conservatively) chose SAFEMIN=sqrt(MINFLOAT). How to detect one is on a Cray 
at runtime using only properties of the floating point is another matter; 
we currently use a very kludgy test but are planning to incorporate a better
one due to W. Kahan.

This is still not enough, however, because of two system software routines,
SNRM2 and complex division. SNRM2 computes the Euclidean length of a vector,
and is part of the BLAS1 library supplied on many machines. The simplest
implementation:
    SNRM2 = sqrt( sum_i x(i)**2 )
obviously fails due to over/underflow on half the exponent range of the
machine. Still, this is how some BLAS1 libraries are implemented because
it has no branches and is fast. This effectively means we must replace 
SAFEMIN as defined above with NEWSAFEMIN = sqrt(SAFEMIN), and scale all 
initial data to be in the range NEWSAFEMIN to 1/NEWSAFEMIN; this scaling 
is done in our driver routines.

Similarly, if complex divide (a+i*b)/(c+i*d) is implemented as
(*)  (a*c+b*d)/(c*c+d*d) +i*(b*c-a*d)/(c*c+d*d)
instead of the safer version on, say, page 195 of Knuth v. 2, then
SAFEMIN is also effectively replaced by sqrt(SAFEMIN). The alternative
safer complex division has a branch, and so the obvious scheme (*) vectorizes
better. Detecting (*) at runtime is a puzzle W. Kahan is working on; I
am inclined to regard the funny implementation of SNRM2 as a bug, although we 
still have to supply working software for such machines, of course.

Another issue is the need for "doubled precision" in the inner loop of
Cuppen's method for the symmetric tridiagonal eigenproblem. If working
precision is double, this means we need quadruple. If the arithmetic
is sufficiently accurate, quadruple can be simulated using double and
some by now standard tricks due to Dekker and many others. For this to
work, we need a guard digit in add/subtract and sufficiently good multiply;
basically multiply should get the exact answer when the exact answer fits
in the format. These are properties nextafter will not capture. Determining
these properties at runtime is tricky.

The need to determine all these properties at runtime is based on the 
traditional (and quite productive) practice of sharing numerical software
across machines at the source code level. Even if you write software on
one set of machines, you have no idea whether it will be asked to run
on another. IEEE arithmetic was supposed to make this headache go away,
but I think it is returning for a different reason: new parallel architectures.
The programming environments in which software is produced (shared vs.
distributed memory, explicit vs. implicit communication, ...) are making
writing portable code quite hard again, so much that the floating point issues
seem almost minor. If we are going to need installation procedures more
complicated than the source-code mail-order model of netlib, it may seem
strange to so scrupulously avoid them for arithmetic reasons.