another comp.arch posting: error bounds
David G. Hough on validgh
dgh
Wed May 29 11:31:52 PDT 1991
There's been some confusion recently on comp.arch about the difference between
errors and error bounds.
If you have computed something and you know the error, you can get the correct
answer just be adding the error to the computed value, then you have
no error. A more realistic situation is that you know a bound on the error,
which confines the correct answer to some region around the computed value.
"Accuracy benchmark" programs which attempt to measure the quality of computer
arithmetic by comparing computed results to the known correct answers are
common but not very interesting because they don't correspond to the much
more typical situation in which the correct answer is not known.
Not many people routinely spend most of their time solving problems whose
answers they already know.
A much
better kind of accuracy benchmark is one which compares the size of the
error bounds. Better arithmetic produces smaller error bounds for comparable
computations using the same word size.
Unfortunately error bounds also reflect the skill of the error analyst who
computed them analytically or numerically, as well as underlying aspects
intrinsic to the problem, or inherent in particular algorithms, or due to
peculiarities of the hardware and software of a particular computer system.
So separating all these aspects is not so easy, particularly for amateurs.
Here's an example of simple error bounds:
z = x * y
in 32-bit floating point. In IEEE single precision or VAX F format,
the relative rounding error will be bounded by 2**-24. (IEEE and VAX
may produce different answers and hence different errors but the error
bounds are identical). In IBM 370
format, the relative rounding error will be bounded by 16**-5 = 2**-20.
Although it won't happen in this simplest example, in general it may happen
that an IBM 370 will sometimes produce a more accurate answer - with a
smaller error - just due to lucky roundoff. But that doesn't help with
realistic problems. The error bound on IBM 370 will always be much larger
than for IEEE or VAX. Once this was figured out, nobody bothered producing
any more base-16 instruction set architectures - if you wanted IBM 370
compatibility, you did it the IBM 370 way, otherwise something different.
As has been observed previously, this difference in the size of error bounds
is largely responsible for
the transition from 32 to 64 bits as the default for scientific computation;
the other major factor was the appearance of the CDC 6600 at about the same
time which offered primarily 60-bit floating point, perhaps
because it was faster to do
that quick and dirty than to do something shorter and cleaner - many, but
not all, computations may be immunized against dirty arithmetic by using
enough precision.
Here's another example:
s = s + x * y
s, x, and y are all IEEE double precision.
In a conventional arithmetic system there will be two rounding errors to
double precision, one following the multiply and one following the add.
In an IBM RS/6000-style multiply-add architecture there will be one rounding
error to double precision. (The PA-RISC multiply-add does not work this
way).
In a 68040 or 80486 extended-precision architectures there will be three
rounding errors, two in extended precision following the multiply and the
add, and one in double precision.
Which is best? The RS/6000 will produce the
smallest error bound, followed very closely by the extended-precision
architectures; the conventional architecture will produce a somewhat larger
error bound. The superiority of extended-precision architectures is
manifested in slightly more complicated examples like
h = x*x + y*y
or in complex multiplication, and in the latter case
extended precision offers superior immunity
to gratuitous intermediate overflow and underflow.
However both extended precision and RS/6000-style multiply-add are problematic
in cases like
a = p*p;
b = a - p*p;
where a mental model based on conventional arithmetic, that expects b to
be zero, is inappropriate but widespread. Much otherwise commendable
software founders either on extended precision or multiply-add or both.
(Extended precision problems are further aggravated by allocating
storage-precision variables to extended-precision registers.)
Thus most versions of Kahan's paranoia program detect extended-
precision expression accumulation correctly and decline to analyze it further,
but inadvertently detect IBM RS/6000-style multiply-add as a design flaw.
In light of the foregoing one might well ask if it's reasonable to expect
that IEEE 754 implementations provide identical numerical results. I consider
that to be a reasonable expectation, provided that it's tempered with the
understanding that whatever uniform paradigm is prescribed for
expression evaluation,
base conversion,
elementary transcendental function evaluation,
it will cost something on all systems and a lot on many,
so that there will always
be a need for alternative non-uniform modes of operation favoring performance,
and these modes will vary among systems.
-----------------------------------------------------------------------------
The best published book on floating-point arithmetic is by Sterbenz and long
out of print. The Computing Surveys article by David Goldberg is a good place
to start, however, and much more up to date. Though even after 25 years
there is very little that needs updating in Forsythe and Moler which
explains one particular mathematical software area about as simply as it will
ever be explained.
I also maintain a mailing list that discusses such issues in the context
of computer system hardware and software
design and implementation. Interested persons
may send requests to numeric-interest-requestavalidgh.com.
More information about the Numeric-interest
mailing list