posting to comp.arch/comp.benchmarks

Tue Jan 11 19:23:41 PST 1994

Subject: IEEE 754 traps, hardware traps, and performance
Newsgroups: comp.arch,comp.benchmarks

Some recent postings have confused a couple of intertwined issues relating
to IEEE 754 floating-point arithmetic "architecture" and "implementation",
particularly with respect to high-performance RISC CPU's.

IEEE 754 defines five classes of user-level "exceptions".    For each
exception, the default "nonstop"
behavior is to continue with a standard-defined result.  But all 754
implementations are required to provide means to detect synchronously
whether these 
exceptions have occurred.    The usual implementation is by a set of
hardware status bits that can be set and reset from user mode.   In SunOS,
a function ieee_flags(3m) can be used for that purpose.   Another
function ieee_retrospective(3m) can be invoked at the end of a program
to print out whether any exceptional behavior arose; it is invoked automatically
for Fortran programs.

IEEE 754 encourages provision of five corresponding "traps" that cause
asynchronous branches to user-mode trap handlers when the corresponding
exception arises.    Such handlers are supposed to have access to the
instruction which trapped, its operands, and should be able to provide
alternative numerical results.   For that purpose, 
SPARC implementations provide, at some cost, a 
floating-point instruction queue which records the PC of floating-point
instructions that have not yet completed.

Implementations
providing IEEE 754 traps are to run with traps DISABLED by default.
Library functions or command line options may be used to enable one or more
of the traps.   In SunOS, a function ieee_handler(3m) and the command-line
option "-fnonstd" can be used to enable SIGFPE to occur on an IEEE exception.
Optional signal handler data structures can be used with some programming
effort to get the instruction and data.    Substituting recomputed results
from user mode is trickier and requires patching sigtramp to avoid its
habit of restoring floating-point registers after SIGFPE or other signals.
SunPro compilers for SunOS 4.x provide a patched sigtramp for that purpose.
There is no convenient programming interface to user-mode IEEE 754 traps
due to lack of standardization, but premature standardization in this area
is fraught with subtle performance hazards.

IEEE 754 can be implemented in software, or hardware, or most typically
by a combination.    An efficient RISC implementation of IEEE 754 presents
interesting hardware design problems, the most interesting of which is
how to implement subnormal operands and results of multiplication; these
usually arise from untrapped underflows.

Thus SPARC implementations prior to SuperSPARC and MicroSPARC generated
hardware traps to supervisor mode - a different concept from the IEEE 754
traps mentioned above - when subnormal operands or results were encountered.
The supervisor mode code in the kernel was responsible for identifying the
trapping instruction, decoding it, fetching the operands, recomputing the
correct IEEE 754 result, and placing it in the intended destination.   This
code has been in place since the first release of SunOS for SPARC.

However a few programs generate subnormal operands or results frequently,
or more typically, zero results from underflows frequently.   There is no great
problem in treating underflows all the way to zero correctly entirely in
hardware, and trapping only when subnormal non-zero operands or results were
encountered, but early SPARC implementations generally trapped all underflows
for recomputation.    This made a very few programs run extremely slowly,
so an alternative nonstandard mode was added to those SPARC hardware 
implementations that caused all subnormal operands and results to be treated
as zeros, entirely in hardware.    Current implementations of SuperSPARC and
MicroSPARC do not require a nonstandard mode since subnormal operands and
results are handled entirely in hardware with satisfactory performance
compared to the normal case.

I mention SPARC and SunOS in the foregoing since I'm very familiar with those
specifics, but most of the RISC Unix workstation vendors could tell similar
stories.    The recent thread in comp.arch and comp.benchmarks relates to
the initial ALPHA implementations and how they fail to fully conform to
IEEE 754.   The following is my understanding of the current situation,
perhaps somebody from DEC will correct any misunderstandings:

1) Current ALPHA chips, like most RISC CPU's, 
do not handle subnormal operands
and results, and causes a hardware trap to the kernel.   However, presumably
due to a rush to market, DEC's operating system kernels supporting ALPHA do not
recompute the correct IEEE 754 subnormal result, providing zero instead; this
is supposed to be fixed in future releases, and thus is not a permanent
phenomenon.     So standard IEEE 754 
behavior with respect to subnormal operands and results is not currently
available.    Of course, from the point of view of DEC's migrating
VAX customers, that doesn't matter since their codes have always operated 
without subnormal operands or results.

2) Current ALPHA chips, unlike many RISC CPU's, do not support precise 
user-mode traps on IEEE floating-point exceptions.   Traps are available,
which can be used when abort or long jump is an adequate exception response.
But it may not be possible to determine precisely which instruction caused
the trap, or to insert a substitute result and continue, and so these traps
fall short of the IEEE 754-defined trapping capability.

However there are, or could be, compiler options that cause synchronization
instructions to be issued after each possibly trapping floating-point 
instruction.   Then a user-mode SIGFPE handler would know that it could
only be invoked because of an exception encountered in the immediately
previous floating-point instruction.    Obviously such compiler options
cause the normal non-exceptional case to run more slowly.
Unlike item 1) above, however, imprecise floating-point traps seem more
likely than not to prevail in future high-performance CPU designs.

What could be done instead?   W. Kahan, the father of IEEE 754 arithmetic,
believes that hardware facilities to support pre-substitution are essential:
prior to a possibly exceptional instruction, one would specify the result
to be used in case the (rare) exception arose, so that no asynchronous
control flow would be necessary, and the common unexceptional case would
proceed at full speed.     So far, however, I have been somewhat intimidated
by the amount of hardware and software support required to fully implement
such a facility, especially in light of the workstation customer 
tendency to make buying decisions on the basis of least-common-denominator
hardware capabilities, in order to avoid vendor lockin, and the PC hardware and
software vendor tendency to provide broken floating-point hardware support, 
or none.

But perhaps this is not surprising, since Kahan defines an EXCEPTION as
any situation in which no matter how you handle it by default, somebody will
TAKE EXCEPTION to your decision!