more on IEEE exceptions and compilers

Tue Feb 14 12:12:05 PST 1995

There has been a lively discussion on comp.lang.fortran about some of the
issues I raised here a while back; for those who don't read such high
volume newsgroups, here's my recent contributions:

 From postnews Thu Feb  9 21:52:43 1995
Followup-To: poster
Subject: Why IEEE 754 specifies nonstop default exception handling
Newsgroups: comp.lang.fortran,comp.arch.arithmetic

In recent discussions of IEEE 754 default exception handling, nobody seems to
have mentioned the reason why it's specified to be nonstop.

The reason the mandatory default is nonstop is that nonstop can be universally
available and understood and avoidable where necessary, 
whereas "abort on exception" is not universally available, as in embedded
systems, nor universally avoidable, because the means for avoiding 
unwanted abortions vary among subroutines coded in other languages, 
command-line options, environment variables, and some or none of the 
foregoing, all expressed differently on every system.    
A popular system available at the time IEEE 754 was being drafted,
UCSD Pascal on Apple ]['s, handled exceptions by allowing the user to
press the space bar before the system rebooted.    The average has improved
somewhat since then, but not enough.

And array languages like APL, for instance, are routinely used with 
uninitialized or uninteresting data in parts of arrays; 
if elementwise 0/0 occurs, but
that element doesn't affect the interesting part of the answer,  then 
stopping to tell the user about it is unproductive.

So algorithms that work uniformly and robustly across rather different systems
and languages are a lot easier to design and usually more efficient if they can
rely on the nonstop exception handling paradigm.

Unanticipated exceptions are more likely to result from programming errors or
misapplication of programs to data outside their domain.   Interactive
debuggers would do
well to provide environments that capture each type of exception at least
on its first occurrence, but most production programs are not run under
interactive debuggers. 

And as far as dealing with program errors and misapplications, the most common
difficulty affecting reliability of numerical results is the routine errors
due to roundoff and truncation of analytic processes, often aggravated by
ill condition of problems.    Immediate termination on the first roundoff error
would not be a very useful debugging aid.    But since efficient and 
convenient implementations of interval analysis require an integrated systems
design encompassing hardware, operating systems, compilers, libraries,
and applications,  most folks would rather avoid the real issue and
argue instead about the relatively rare
problems arising due to overflow and underflow.

 From postnews Tue Feb 14 12:09:25 1995
Followup-To: poster
Subject: More on IEEE 754 Exceptions
Newsgroups: comp.lang.fortran,comp.arch.arithmetic

One definition of a numerical exception is: any situation where no
matter what you do, somebody will take exception to it.
This definition certainly encompasses all postings to all USENET groups,
but aside from those...

The most common exception is inexact arithmetic due to finite
significance; most people ignore it, sometimes for sound reasons and
often not; but it's important to people doing number theory with
floating-point variables for speed as long as it's exact; when the
inexact exception arises, they must switch to slower methods with
multiple-precision integer or floating-point arithmetic.

Also common are underflow and overflow due to finite exponent range;
they are just bigger rounding errors, but they may be of no greater
significance than normal roundoff depending on what happens with the
result; underflow produces very small errors in absolute value, so if
the underflowed quantity is added to something of normal size, it
doesn't matter, while if it is multiplied by something of normal size,
it may matter.    Similarly overflow produces very small errors looked
at in a certain peculiar way:

	1/x - 1/X < eps

which however is just the right way to look at continued fractions.

Interval arithmetic in principle removes all the issues related to
these three exceptions.   IEEE 754 was intended to facilitate interval
arithmetic, so it certainly did not want to specify termination on any
of them.

Another kind of exception is called division by zero but it really
should be called "pole" (thinking of 1/0 or tan(90 degrees)) or
"singularity" (thinking of real log(0)).  It corresponds to creating an
"exact" infinite value from finite operands.  There is no issue of
finiteness of significand or exponent here, rather the underlying
mathematical function is most conveniently thought of as infinite.
Continued fractions and other analytic operations, especially in
complex arithmetic, may encounter intermediate poles that are of no
great significance because they are destined to be inverted and become
zeros again.   People who don't think they are using such methods may
be mistaken if they are using mathematical software from netlib or
commercial vendors like NAG or IMSL.  Interval arithmetic can be
extended - at considerable performance cost perhaps - to encompass
exterior intervals in order to cope with poles.

All other IEEE 754 exceptions are called invalid and lumped together
for purposes of standardization.    Some correspond to essential
singularities in complex or real arithmetic.    Invalid operations that
produce a floating- point result produce a NaN which propagates, as
well as setting the invalid exception flag.   Invalid operations which
don't produce a floating-point result, such as ordered comparison
involving NaNs, or conversion to an integer format, only set the flag.
Conversion to integer exceptions encompasses NaNs, infinities, and
normal numbers that are larger than the largest representable integer.

One of the goals of P754 was avoiding any language-specific binding in
the arithmetic spec, because any attempts in that direction had become
bogged down in petty haggling; and in retrospect, binding probably
would have been to the wrong language - perhaps Pascal or Basic instead
of C.  And it was and is kind of hard to see how to specify termination
of execution in environments like embedded systems or even GUI window
systems that do not "terminate" in any usual sense.    Furthermore, a
goal of P754 was to specify an environment for correct, robust, and
efficient software.   Programs that don't anticipate and handle all the
exceptions that may arise, or prove they can't arise, may need a
different environment while they are being debugged.

All conforming IEEE implementations provide evidence that unrequited
exceptions occurred, in the form of accrued exception status bits.
Only one that I know of (Sun Fortran) brings these forcibly to the
user's attention on stderr when a program terminates normally or
abnormally.

On all other systems, one has to hope that exceptional conditions -
other than inexact - that pollute the result will do so by printing out
NaNs or infinities.   But as mentioned, there is some justified
uneasiness on such systems with respect to ordered comparison of NaNs
and conversions to integer formats.  After all, in some sense the ideal
program would simply print out either "yes" or "no" as the answer to
the question it was designed to answer - and this implies that
numerical data was converted to a logical value at some point.  (As a
practical matter we might like the program to show some of its work to
fortify our confidence in the answer).

There are a couple of approaches for such programs - enabling
termination and explicit checking - depending on whether exceptional
operands are intended to be dealt with as "expected exceptions" or
"unexpected exceptions".  Sorting programs, for instance, may be
designed to handle NaNs, or may not.  In the latter case, ordered
comparisons involving NaNs are likely to produce unexpected results, so
they must be avoided, either by trapping the exceptions that create the
NaNs, or by explicit checking "if (x == x)".  In the former case,
exactly what's to be done with NaNs varies according to the purpose for
which the sorting is being done, and NaN handling will have to be built
in somehow via explicit checking or something more complicated - no
predefined result of unordered comparisons will suit more than a few
situations.   But writing the NaN-aware sort will be easier knowing that
an unordered comparison won't terminate the program.

Programs which are written to be used by others under IEEE arithmetic
should be prepared to expect any exceptions that can arise and deal
with them properly.    Issues arise with programs that were written by
somebody who never heard of IEEE arithmetic or doesn't understand it or
has been unduly influenced by incorrect implementations.    In that
regard, most old Fortran programs were originally written before IEEE
arithmetic became common and so fall into the "never heard of"
category.   It's appropriate for such programs, if run in batch mode,
to terminate on all the exceptions that the author didn't think about,
and most Unix Fortran compilers provide ways of obtaining that
behavior.    As for the "don't understand" folks, I'd recommend David
Goldberg's tutorial that appeared in Computing Surveys a few years
ago.  But programs newly written to exploit Fortran-90 features may
reasonably be expected to be IEEE 754-aware and should not be
inconvenienced by the requirements of older programs.

In contrast to all the foregoing, there are a few occasions when NaNs
SHOULD disappear without a trace; typically constant functions like

	pow(x,0)

and

	hypot(x,infinity)

that return unexceptional
1 and infinity, respectively, for all finite and infinite
operands x, might as well do so for NaNs as well, since their unknown
values could not possibly have affected the outcome.