IEEE 754 extended precision expression evaluation - we did it wrong

Thu Mar 14 13:50:57 PST 1991

[ A message for the numeric-interest mailing list.  (If this ]
[ shows up in the yak-lovers mailing list or something, let  ]
[ me know, and I'll try to find out what I screwed up.)      ]

    I'm assuming that the message I am responding to came from David
Hough, at Sun Microsystems.  The headers I see for messages on this
mailing list are usually pretty ambiguous (and of course it wouldn't be
sporting to put a ".signature" at the end :-) ), so, as far as I know,
it could have come from anyone.

    Anyway, David writes:

[ description of an IMSL routine failing when compiled with -O3 -f68881 ]

> ...; the problem was that TWODF had
> been allocated to an extended-precision register on the 68881.
  ....
> Unfortunately the logic of the code depended upon being able to
> identify elements of WK by comparing them to TWODF.  If no array
> element matches TWODF then there will be a problem.  What happened
> was that WK(L,M+5) contained the value of TWODF rounded to storage
> precision, and that wasn't the same value as TWODF contained in an
> extended-precision register.

    I'm a little surprised that you are surprised by this problem
Perhaps it comes from hanging around Jim Demmel [a student of
W.Kahan's] and the like, but I thought this sort of thing was a
well-known problem.  I know that some of us on the LAPACK project have
had problems with convergence tests of the form "IF( A.EQ.A+B )".  Jim
and his crowd are always saying that "there are no safe optimizations
with floating point."

    (By the way, my vote is to call this a compiler problem.  I don't
expect much of floating point, but A=B ought to result in A.EQ.B,
at least if A and B are of the same type.)

    Moreover, although I haven't done much lawyering with the IEEE-754
standard, I had assumed that the type rules were similar to FORTRAN,
that is, that an operation with two args of type X was supposed to
return a value of type X.  (It certainly doesn't seem to forbid it, in
any case.)  I remember someone saying that IBM got a "papal
dispensation" from W. Kahan to allow the RIOS to compute A+B*C with
only one rounding step.  In any case, this is one area in which the
FORTRAN standard is explicit, whatever the IEEE standard may say.

    Finally, at least on the Suns, I thought that the whole point of
the FORTRAN compiler's -fstore option was to force a rounding step at
each "=" in the source.  If not, I will be very annoyed, as it will
mean that the single-precision code I have been developing is really
partly double, and I may be missing bugs because the spurious high
precision is hiding them.  I know that the documentation for the 68881
chip describes bits which specify how results are to be rounded, so it
shouldn't be a problem to run the 68881 so that all results are rounded
to storage precision.

    Actually, the idea of automatically promoting intermediate
results to a higher precision doesn't seem to me to be a win, anyway.
Why?  

(a) In most cases, the programmer doesn't need the accuracy.  If
    (s)he needed, (s)he would have explicitly specified higher
    precision.

(b) The placement of the higher-accuracy stuff ends up being arbitrary.
    Unless the entire code is promoted, there will be rounding steps.
    And how is a compiler to know where it is safe to round and where
    such a step will kill whatever accuracy was gained by the higher-
    accuracy statements?  People get PhD's for figuring stuff like
    this out.

(c) Depending on it isn't portable, since not everyone does it, and
    they certainly won't do it the same way (moreover, it will change
    from compiler release to releast.)  So you have to do your analysis
    assuming that you don't have it.

If increased precision is really needed within an expression, most
languages have ways to express it: e.g., in FORTRAN:

    REAL X, A, B, C
    ....
    X = REAL( DBLE(A)*DBLE(B) + DBLE(C) )

should insure that the operations are done in double precision
and then rounded to single.  Extended precision ought to be included
as a language extension.

    The only good reason for automatically promoting stuff to higher
precision is that the hardware does the higher precision faster (this
includes when it only does one precision--the higher.)  In this case,
I can think of three possible ways of generating code:

(a) round after every machine operation.

(b) round after every "=" in the source.  This might require extending
    the intermediate code in certain compilers to have a rounding
    operation, but that should be doable.

(c) round only when something is actually stored.  This seems to be
    what the compiler does that David is describing.

Note that this is in order from safest to unsafest.  If (a) is no
slower than (b) or (c), I would always go with (a).

    By the way, I never did figure out exactly who "we" was in David's
message.

    (To spoil everyone's fun, I will include a .signature:)

Alan McKenney        E-mail:  mckenneyacs.nyu.edu         (INTERNET)
Courant Institute,NYU         ...!cmcl2!cs.nyu.edu!mckenney   (UUCP)