NCEG expression evaluation proposal: precise - uniform - natural

David G. Hough on validgh dgh
Sat Sep 1 16:48:35 PDT 1990


Most of you will recall previous discussions of this question: can 
floating-point arithmetic be standardized to produce the same results on
all IEEE 754 systems?  To get the same numerical results (disregarding
IEEE 754 exceptions) would imply:

	expression evaluation according to a universal rule

	correctly-rounded conversion between binary and decimal

	correctly-rounded elementary transcendental functions
		(this is still researchy)

To get the same exception results with traps disabled requires more:

	exceptions must be recorded "soon" after they occur

To get the same results with traps enabled requires even more:

	trapped exceptions must be caught immediately after they occur
	trapped exceptions must be continuable

Especially for the benefit of a higher-level language debugger the last case
requires exceptions to appear to arise in the statements in which they
actually arise.

The researchy aspect of elementary transcendental functions is that
correctly-rounded results may be too expensive and double precision
in the immediate future.

*************************************************************************

As I envision it, there are three possible choices for floating-point
processing that affect compile-time code generation and possibly
run-time actions too:

precise/debug/slow mode:
	the numerical results and exception results of all programs are
identical on all systems

standard/uniform/fixed/common/universal mode:
	the numerical results of programs that don't test exceptions or
enable traps are identical on all systems

natural/fast/nonstandard mode:
	floating-point arithmetic satisfies IEEE 754 in a way that is
natural to the processor

Default: each implementation chooses its own default.  Each implementation
is only required to implement one mode.  There are environmental inquiries
that indicate which mode is in effect and whether a different mode can
be elected at run time on this implementation.

**************************************************************************

People get worried about names so I suggested several alternatives.

Note that programs using long double types should not be compiled in 
precise or uniform mode without generating a lint-like warning about 
portability since NCEG long double may be double, double extended,
or quad.

What's to prevent everybody from simply implementing natural mode and
forgetting everything else? 
Switching to the big picture now, I envision GCC extensions of all three
modes on enough 32-bit IEEE 754 implementations 
(any three of SPARC, MIPS, SPIM, PA-Risc, RIOS, 80486, 68040 would suffice) 
that everybody else will go along, too, because sufficient numbers of
customers will demand uniform results as a check-off item.   Then the race
will be on to provide uniform mode that is as fast as natural mode.
That's a worthy target even if not completely achievable.

Who's going to do all these GCC extensions?  I'm planning to do my part,
although I don't blame anybody who's skeptical based on my progress so
far with base conversion.  And I have colleagues who are interested in the
transcendental function problem, so I expect that will develop as well.

**************************************************************************
Details

Natural encompasses extended-precision expression
evaluation on 68881's and 80387's and RIOS-style multiply-add with
one rounding error.   It also encompasses many parallel algorithms
that (for instance) divide up a do loop into several pieces for
different processors so the final result is a function of how many
processors were available.  The relative performance of natural to
uniform mode is a function of how well the hardware and software architecture
map to uniform mode.

Precise requires all exceptions pending in the pipe to be emptied whenever
the accrued exception status is interrogated.  More importantly, it requires
traps to occur within the statement in which they arise, and to preserve
enough state so that a user trap handler can continue.   This will
necessarily be expensive on high-performance systems.  It's the maximum
debugging mode.  Dividing up a complicated expression into a bunch of
intermediate steps (assigning to explicit rather than anonymous compiler-
generated temporary variables) shouldn't affect the final numerical or
exception result.

Uniform can probably be made sufficiently efficient to be the default on
most high-performance uniprocessor systems.  For C, the order of expression
evaluation is already completely determined by the language, as I learned
to my surprise here a couple of months ago.  So the only issue is precision
of intermediate results.    Some common schemes include

	traditional Fortran style - inherited from operands
	traditional C style - inherited from operands, but at least double
	widest-available style - 68881/80387 with long double = extended
	extra-precise style - 68881/80387 with long double = double 
		[e.g. Sun-3 with C 1.0 -f68881]
	widest-used style - the widest precision used in a particular
		expression (expression boundaries are discussed in a
		paper by Corbett).

All the schemes except traditional Fortran 
are better in the sense that error bounds are usually
lower, but the result can change
if you break an expression into subexpressions with explicit temporaries,
unless you're very careful about declaring precisions of those temporaries -
and with extra-precise style you can't even name that precision.
If you can't predict exactly what the precision of the temporary would be,
you may get slightly different results from inserting debugging
statements.  So it seems to me that
traditional Fortran style is simplest for the debug mode.

Uniform mode results shouldn't depend on how long double is defined or what
anonymous extended precision underlies the natural arithmetic, so I 
don't think widest-available is suitable for uniform mode.  Any of the
following would be OK with me for uniform mode:

	Fortran
	C
	widest-used

**************************************************************************

What's the result of all this trouble?  When you are debugging a new program
or (as we do more often on Unix systems) porting an old program, or
(even more often) validating new hardware or compilers, you can
start off by running in precise mode to detect gross logical errors
and then subtler ones since the numerical and exception results should be
identical or another conforming system.  After that you can relax the
compilation mode to uniform and then to natural, at each stage verifying
that the results remain acceptable, even if different, and that the
performance improvement in sufficient to warrant accepting the differences.

As it stands now we do all these things at once, with the net effect that
routine subtle tests tend to be driven out by routine simple ones, 
because it's too hard
to tell whether subtle differences arise because of acceptable differences
or unacceptable bugs. 



More information about the Numeric-interest mailing list