Notes on Expression Evaluation and Exceptions in SPARC

Tue Jul 23 20:01:11 PDT 1991

I attended another meeting with Kahan and Demmel on Monday 22 July.
Some highlights:

1) Kahan thinks that only VAX VMS can conform to LCAS without major
inconvenience, for various obscure reasons; a different reason in 
each case.

2) Kahan commented favorably on the high-precision arithmetic package
developed by David Bailey as a more modern substitute for Richard Brent's.
Bailey's package uses FFT multiplication methods when significands are
long enough.

3) If the loop
	s = sum.i x(i)*y(i)
can be accumulated in higher precision, even 1.5X slower than in working
precision, it (and similar loops) should be compiled that way by default.
Similar computations may take forms like
	x(j) = x(j) + sum.i c(i)*y(i)
The extra accuracy and exponent range permits algorithms, that are much more
than 1.5X faster,
to be used often enough to be worthwhile.
If the slowdown is 4X, that's clearly too high a price to pay. 
One underlying hardware implementation is to provide machine instructions
for higher-precision product of two working-precision operands, and for
higher-precision sum and difference, and for conversion between higher
and working precision.  Full higher-precision hardware support including
higher-precision product of higher-precision operands, and division and sqrt,
is not necessary for this purpose.
	Exploiting fused multiply-add could provide the same accuracy 
benefit, but slowing down by at least 2X, and
without immunizing against gratuitous intermediate exponent spill.
	The ultimate justification for providing full quad in hardware
is that it will be cheaper than the brainpower to figure out how to avoid
using it.  You could say the same thing about floating-point hardware
in general: it's cheaper than figuring out how to get along without it.

4) Compilers should begin providing "scan for widest needed" expression
evaluation as soon as possible, to get users used to the idea.
Corbett described the essential notions in
letters in SIGPLAN and SIGNUM many years ago: the basic idea is that 
mixed-precision
expressions are evaluated in the precision of the most precise operand.   
There are complications, for instance in evaluating expressions that are
to be actual arguments in Fortran-77 procedure calls.

The incentive for implementing "scan for widest needed" now is to pave the way
for the eventual provision of arbitrary-length floating-point arithmetic
(in software, not hardware).  The arbitrary length is only determinable
at run time, and provides a way to continue computations based on fast
unstable algorithms, or a way to solve ill-conditioned problems accurately
when that's called for.   The requisite precision will be a function of the
particular input data and may vary from run to run of the same program.

5) SPARC exception handling.  Kahan views the underlying SPARC hardware trap
handling paradigm unfavorably.  It appears that the main reason is that
(unlike MIPS) in all SPARC implementations so far,
floating-point traps occur on the next following 
floating-point instruction rather than on the instruction that caused the
problem.  This means that the trap may be reported in a different function
than the one in which it arose. Although SPARC requires that the hardware
report the PC address of the offending instruction, that information may
not be very helpful in a high-level debugger.

I have never seen anybody posting to comp.arch that they bought a MIPS ISA
system rather than a SPARC on the basis of immediate floating-point traps,
however it is worth
noting that the SPARC V8 ISA permits implementations to handle floating-point
traps either immediately (like MIPS) or deferred like all SPARC implementations
so far.  I suppose if IEEE P1754 changed the requirement to immediate
traps on all new implementations Kahan would be happy.   There may be patent
issues to settle with MIPS, but aside from that, hardware designers are in a
serious dilemma about this issue:  they hate having to implement the queue
necessary to hold instructions so they can complete or trap in the same order
the instructions were started, but the implementers also are anxious
about committing forever to immediate traps; 
that means a fast decision has to be
made about whether an exception can occur in order to kill the next 
instructions in the pipe if so.  
The fast decision has to be based on the exponent field
and sign bit pretty much; this means that infinities may well be trapped,
which Kahan doesn't like.  MIPS ISA has an advantage relative to SPARC in one
respect having to do with the layout of the floating-point register file.
In any event the kernel or user mode trap handler
has to emulate the instruction through to the end
to see if it really is exceptional and if so
signal the user program appropriately.

Another aspect of the immediate trap
approach is that every instruction traps if the
inexact trap enable bit is set, which inhibits another of Kahan's goals to be
able to do integer arithmetic in floating-point registers at floating-point
speeds unless an inexact exception occurs.  Such algorithms would have to be
implemented by testing status bits from time to time (slowing down the normal
case) instead of by trapping only if a rounding error occurs.

A second aspect of SPARC trap handling, in this case common to most 
UNIX implementations, is that the SIGNAL overhead to propagate a hardware
trap to a user-mode software handler is ferocious.   A different mechanism
for IEEE implementations at least, that allowed direct hardware traps to
user-mode handlers, would be preferable.   Various aspects of exception logging
in order to enable better diagnostics, are more attractive if not too
inefficient.

The overall idea is to interpose some standard run-time software between
the user's code and the raw hardware interface to allow a fairly
portable menu of facilities
like presubstitution and counting mode and tracebacks to be specified for
various exceptional situations.   The menu and required features have been
written up in various memoranda such as "Floating-Point Exception Handling -
Does it Deserve to be taken Seriously?"