Eliminating variation among SPARC IEEE implementations

Fri Feb 2 08:51:13 PST 1990

     The following discussion is mostly of interest to people interested in
IEEE arithmetic hardware for SPARC systems.  Comments by others who've faced
similar issues are welcome, however.  One item of background may be helpful:
SPARC hardware floating-point traps come in several flavors: unimplemented
instruction, unfinished instruction, IEEE trap enabled. IEEE trap enabled
traps are in response to the user setting the IEEE trap enable bits to indi-
cate a desire to get SIGFPE for those exceptions.  Starting with SunOS 4.1,
user-written SIGFPE handlers will be able to obtain the address of the trap-
ping instruction and store user-defined numerical and exception results.  In
contrast, unfinished instruction traps indicate the hardware has punted on a
difficult case like subnormal operand or result, and the kernel recomputation
software is expected to compute the right numerical result and exceptions and
continue slowly but otherwise invisibly.  All are currently handled the same
way by the kernel recomputation software, but that could change.

     A number of details in IEEE 754 are left to be defined by implementa-
tions, and so are undefined in the SPARC architecture document.  The following
proposals eliminate all uncertainties and open ends.  These are my personal
proposals and if approved would most likely be published as a list of recom-
mendations for SPARC implementers.

     Previously many boundary cases have been handled as unfinished fpops
[floating-point operations] and defined in the kernel.  More implementers are
taking my advice to avoid unfinished fpops as much as possible to reduce the
incidence of programs with anomalous performance.  Consequently they would
like to know what the kernel handling is, and whether it's correct.  Their
doubts in the latter matter have turned out to be somewhat justified.

     All existing implementations can't come into conformity with all aspects
of these recommendations or any other complete set, but that's acceptable; the
goal is to know what all new implementations will do.  Unix diagnostics desire
a rigid specification of what's visible in user mode; standalone diagnostics
and simulators desire a rigid specification of what's visible in supervisor
mode.  SunOS kernels will undoubtedly continue to be defensively coded to work
correctly with all known implementations.

Unaligned floating-point data registers

     The effect of using an instruction that refers to a double-precision
operand in a register not 0 mod 2, or to a quadruple-precision operand in a
register not 0 mod 4, is undefined.  This creates difficult-to-find software
problems.  A specific hardware trap (illegal alignment or instruction) should
be specified.

FSR [Floating-Point Status Register]

     How FPU version numbers are assigned should be better documented. I sug-
gest that version 6 be reserved for early masks of a part, and that production
parts be assigned versions starting with 0.  That way, prototype parts will
have a better chance of being detected if accidentally slipped into the field.
(or should it be version 0 for prototypes and 1..6 for production parts?  What
convention is most likely to be a simple change in a prototype mask that
proves to be good?  Version number 6 hasn't been used yet, so it's available
for prototypes.)

     The cexc field [current instruction exceptions] should be specified to
indicate the trapping exception to the kernel trap handler on an IEEE fp trap,
and to a user-coded signal handler on a SIGFPE.  The cexc field should be
specified to be unchanged on all other fp traps.

Exceptions inhibit results

     To summarize what's written in various places here and elsewhere:  when
an fp trap occurs, the rd [destination] register is unchanged; the fsr fcc
[floating-point condition codes] is unchanged; the fsr aexc [accrued excep-
tions] is unchanged; the fsr cexc is unchanged except for fp IEEE traps.

Integer overflow definition

     f[sdq]toi: when a NaN, infinity, or large positive argument >
2147483647.0 or large negative argument < -2147483648.0 is converted to
integer, the exception is invalid.  The kernel has always supplied numerical
results of either 2147483647 or -2147483648 depending on the sign bit of the
operand, and this should be specified.

NaN operand and result definitions - floating-point result in different format
from operands

     f[sdq]toi is covered above.

     f[sdq]to[sdq], quiet NaN operand: no exception, result is the quiet NaN
operand transformed as follows:

     NaN transformation: The most significant bits of the operand fraction are
     copied to the most significant bits of the result fraction.  When con-
     verting to a narrower format, excess lower order bits of the operand
     fraction are discarded.  When converting to a wider format, excess lower
     order bits of the result fraction are set to 0.  The quiet bit (most sig-
     nificant bit of the result fraction) is always set to 1.

     f[sdq]to[sdq], signaling NaN operand: invalid exception, result is the
signaling NaN operand processed by the NaN transformation above.

     fcmpe[sdq] with any NaN operand: invalid exception, unordered fcc.

     fcmp[sdq] with any signaling NaN operand: invalid exception, unordered
fcc.

     fcmp[sdq] with any quiet NaN operand but no signaling NaN operand: no
exception, unordered fcc.

NaN operand and result definitions - floating-point result in same format as
operands

     No NaN operand: invalid exception, result is this quiet NaN: sign = 0,
exponent = all 1's, fraction = all 1's.  sign != 1 to distinguish such results
from storage initialized to 1-bits.

     One operand, quiet NaN: no exception, result is the quiet NaN operand.

     One operand, signaling NaN: invalid exception, result is that operand
with the quiet bit (most significant bit of fraction field) set to 1.

     Two operands, both quiet: no exception, result is the rs1 [first source]
operand.  [SunOS 4.1 kernel recomputation software supplies rs1 for most op
codes but not all.]

     Two operands, both signaling: invalid exception, result is the rs1
operand with the quiet bit set to 1.  [SunOS 4.1 kernel recomputation software
supplies rs1 for most op codes but not all.]

     Two operands, just one a signaling NaN: invalid exception, result is the
signaling NaN operand with the quiet bit set to 1.

     Two operands, neither signaling NaN, just one quiet NaN: no exception,
result is the quiet NaN operand.

Trapped Underflow definition (underflow SIGFPE enable UFM=1)

     Underflow occurs if the correct unrounded result has magnitude between
zero and the smallest normalized number in the destination format.  In terms
of IEEE 754, this means "tininess detected before rounding".

     Note that the wrapped exponent results intended to be delivered on
trapped underflows and overflows in IEEE 754 aren't relevant to SPARC at the
hardware/supervisor levels; if they are created at all then it would be by a
user-mode SIGFPE handler.

Untrapped underflow definition (UFM=0)

     Underflow occurs if the correct unrounded result has magnitude between
zero and the smallest normalized number in the destination format, AND the
denormalized and correctly rounded result in the destination format is inex-
act.  In terms of IEEE 754, this means "tininess detected before rounding" and
"loss of accuracy detected as inexact".  An IEEE trap may occur if [inexact
SIGFPE enable] NXM=1.

     Note that overflow hugeness is defined to be detected after rounding;
"after rounding" is easier to understand than "before rounding": why take
exception to a result in the normal range that suffers only normal rounding
error?  But underflow detection after rounding is probably trickier to imple-
ment in hardware because the rounding has to be done once (in principle, to 24
significant bits in single precision) and then undone prior to denormalization
and rerounding.  Detecting underflow before or after rounding is different in
the default round to nearest mode just in the case when the exact result (in
single precision) is < 1.0h-126 and >= .ffffff8h-126 .  (I had previously
incorrectly asserted that the latter constant was .ffffff0h-126.) When round-
ing positive results toward +infinity, detecting underflow before or after
rounding is different in the case when the exact result (in single precision)
is < 1.0h-126 and > .fffffeh-126.

     My previous discussion of a superstandard underflow definition is
oriented toward the future; the proposals above are intended to solve immedi-
ate problems.

----------------------------REMINDER----------------------------

You receive this message because you are on the mailing list
        ...sun!dgh!numeric-interest
Send messages to that address about floating-point arithmetic,
elementary transcendental function computation, numerical exceptions,
and related C and Fortran issues.

To get yourself off that list or to get somebody else on, send mail to
        ...sun!dgh!numeric-request

Other forums are appropriate for more general discussions:

        naana-net.stanford.edu
                moderated mailing list for numerical discussions
        sci.math.num-analysis
                unmoderated USENET newsgroup for numerical discussions

-------------------------------------------------------------------