accuracy of inputs

Mon Feb 25 22:23:39 PST 1991

No system can read the user's mind as to the accuracy of the
numbers she or her program supplies as input to system routines.
Since we must nevertheless design general purpose libraries,
the following rule of thumb may be used: make the function
or operation (like +, sqrt, sin ...) as accurate as possible,
assuming the input is exact, provided not "too much" performance
is sacrificed. Manufacturers adopting IEEE 754 or 854 have
already agreed to supply correctly rounded +, sqrt, etc.
Sin and other transcendentals (Bessel functions ...) are rather
harder, and so research will continue. Thus, we expect the 
level of accuracy attainable without sacrificing performance to advance
with time. I would not try to standardize what the MOST accurate
function library necessary is, since that would discourage progress.

To say that floating point numbers are inaccurate in their last
bit, and so the machine can be correspondingly sloppy in function
accuracy, means that we might as well use Cray arithmetic and
forget about IEEE. For a great many computations, this hurts
nothing. But those it does hurt, it hurts badly. 

To give just one example, in the LAPACK project (which is supplying 
a replacement for LINPACK and EISPACK on supercomputers), the
fastest routine for finding eigenvalues and eigenvectors of a
symmetric tridiagonal matrix was delayed for a year because of
the lack of a guard digit in Cray arithmetic. The best version
of the code works on other significant machines, without using
mixed precision arithmetic. We wanted to avoid mixed precision,
because if working precision is double, quadruple is generally
not available. Because of Cray, just for this code we had to
abandon this goal and use some double precision in our single
precision code, so it would run on a Cray. The double precision 
code tests for "Crayness" at runtime and returns an error message 
if satisfied; otherwise the algorithm which needs a guard
digit is used.

Finally, a comment on interval arithmetic. In the rare case
that the input to an interval subroutine A is exact, it is
frequently possible to use Kulisch's accumulator (or better
techniques) to get the output to be an interval one floating
point number wide: the best possible. But if this output
is then used as input to another interval routine B (as any
nontrivial application would do), then the subsequent
output intervals must expand by the condition number of
the problem solved by B. Subsequent subroutine calls
magnify the interval widths by further condition number
factors. This is mathematically inescapable. Thus, to 
continue to compute very long, we must be able to make the
interval width output by B as small as required: this means
variable precision arithmetic. Any interval arithmetic
system with just single and double (and even quad) precision
will be limited to a fairly small number of applications,
those that do not require processing by many consecutive
subroutines. Larger applications will require variable
precision.

   Jim Demmel
   Comp. Sci Division and Math Dept.
   U. C. Berkeley