superstandard underflow

dgh dgh
Wed Jan 3 19:50:46 PST 1990


I welcome comments on the following before I distribute it more widely.



Why is the definition of underflow in IEEE arithmetic so complicated?  What
should be done instead?  Formatted copies may be obtained by
        tbl /home/teak/dgh/memo/underflow | eqn | troff -ms

     A couple of requests for definitions of underflow got me to thinking
about the problem axiomatically.

Representable Values

     A conventional computer arithmetic storage format, fixed or floating
point, defines a set of representable numeric values.  These are just numbers.
They will include one or more representations of zero, many non-zero finite
numbers, and sometimes representations of one or more infinity values, and
sometimes non-numeric symbols.

Rounding

     Rounding is the process by which an arbitrary non-zero finite real number
is mapped to a representable value in a particular target arithmetic storage
format.  Many different rounding schemes have been proposed; reasonable ones

*    map representable values to themselves

*    are monotonic

*    are independent of the operation which produced the unrounded result

     and consequently map unrepresentable values to either the next larger
representable value or the next smaller one.  The rounding error is the
difference between the unrounded result and the rounded representable value.

Spill

     In a conventional floating-point storage format with separate exponent
and significand fields, exponent spill, or overflow/underflow, occur when a
larger-than-normal relative rounding error occurs because of the finiteness of
the exponent field.  But it's not the exponent finiteness itself that's of
concern, but the abnormal rounding error, because a lot of theory and intui-
tion about normal rounding errors may be inapplicable.  In contrast, a normal
rounding error only arises because of the finiteness of the significand in the
storage format.

     Unconventional floating-point formats such as logarithmic may not have
distinguishable exponent and significand fields and so there may be no dis-
tinction between normal roundoff and spill.

     We distinguish overflow and underflow in conventional formats as associ-
ated with large (magnitude >> 1) and small (magnitude << 1) representable
values respectively.

     When spill occurs, the roundoff due to finite significand only can be
separated from the total roundoff by considering a hypothetical storage format
with the same finite significand as the target format but unlimited exponent
range.  The roundoff due to finite significand is by definition a normal
rounding error; spill occurs just when the total roundoff is different.

Overflow

     The situation is comparatively simple for overflow.  Overflow occurs just
when the exact result exceeds some threshold that depends on the storage for-
mat and rounding rule.  For instance, in IEEE single precision, overflow
occurs when the exact result x satisfies

           RN: round to nearest   (2**128 - 2**103)   <=    |x|
           RZ: round to zero      2**128              <=    |x|
           RA: round away zero    (2**128 - 2**104)    <   |x|

"round away from zero" refers to the case of round to positive for positive x
or round to negative for negative x.

     IEEE arithmetic requires an overflow exception when the rounded result
suffers exponent overflow, which is just the case indicated, and it's easily
implemented in hardware or software by rounding and checking the exponent
afterward.  There is no difference in definition for traps disabled or
enabled.  Things would be more complicated if there were a "gradual overflow"
definition based on "supernormal numbers", but so far nobody has thought of
one worth its cost.

     Some non-IEEE systems might define overflow based on the exact rather
than rounded result.  If the rounding mode is other than round to zero, a few
superfluous overflow exceptions would occur that don't correspond to any
unusually large rounding error.

Underflow

     Underflow is more complicated in the presence of "gradual underflow" to
"subnormal numbers". The hypothetical format to which rounding errors are com-
pared has the full significance of normalized numbers.  Some values round to
subnormal representable values without underflow because they are exact, or
because the same representable value would be obtained (with a different
representation) in the hypothetical format.  Detecting underflow correctly is
an interesting exercise, but when worked out turned out to be less difficult
than I had supposed.

     The IEEE 754 committee declined to impose the exercise on implementers,
and instead decreed that underflow could be signaled for any exact subnormal,
inexact subnormal, or inexact zero result or for any inexact normal result
obtained by rounding a value less than the smallest normalized representable
value.  Furthermore, when underflow trapping is enabled, all inexact zero,
inexact subnormal, and exact subnormal results must generate underflow traps,
and underflow exceptions may also be generated by inexact normal results
obtained by rounding values less than the smallest normalized representable
value.

     All this flexibility was put in the standard because it can't matter in
any sensible program and it was feared that an onerous burden might be placed
on some implementations if choices were restricted. Furthermore it was
believed that there would be demand for software that prevented those
dangerous subnormal numbers by catching them when they were created.  (That's
also the thinking behind the Subnormal Operand exception on i80x87).  So the
underflow trap is actually a more general "subnormal result or inexact zero
result" trap that finds exceptions even for exact results.

     The net result is that hardware implementers became significantly con-
fused about what they could and could not do about underflow, leading to
implementations that were not self-consistent or that gratuitously differ from
other implementations.  As expected the differences are of no concern to rea-
sonable programs but bedevil the IEEE test vectors and other diagnostic pro-
grams that really prefer absolutely identical results on all conforming imple-
mentations. I hope the latitude has been invaluable to many implementers
because it has cost me disproportionate grief.

     Furthermore almost all high-performance hardware implementations punt
completely on subnormal operands and results and require software intervention
to get any conforming IEEE default behavior.  Since software intervention is
required, it might as well intervene in the same way always.  The cost is
negligible.

Superstandard and Substandard Arithmetic

     I envision and may someday get written down a proposal for a C program-
ming environment that looks a lot like IEEE 754 but has two modes, superstan-
dard and substandard, that are relatively more and less restrictive than a
straightforward C implementation of IEEE 754.

     The justification for the substandard mode is that a lot of IEEE 754
machinery is of little value for programs that were written to run portably on
IBM 370, DEC VAX, and Cray computers, and many current high-performance arith-
metic implementations can run some of those programs a lot faster unres-
tricted.

     The justification for superstandard mode is that it now appears to be
technically feasible to obtain identical results on different systems, for
many common computations, by requiring correct rounding for base conversion
and the common elementary transcendental functions and eliminating the lati-
tude in defining double-extended precision. Identical results are invaluable
in debugging large complicated programs because the changes due to compiler
optimization bugs or subtle hardware errors may look just like the changes due
to minor improvements in the elementary transcendental functions or base
conversion, or due to the difference in allocating three variables in an
expression to extended precision instead of two because of better register
allocation strategy in a compiler.

     Such modes (in the guise of "levels of conformance") were considered and
rejected by the 754 committee because there was justifiable fear that the sub-
standard mode would become the de-facto standard since, after all, even fully
standard implementations would produce different numerical results from the
same programs.  I think that this fear may be set aside if we can promise the
user of superstandard mode identical results with close to optimal efficiency
for most programs. Customer demand will provide an overwhelming incentive for
implementers to provide a good superstandard mode implementation, even if they
also choose to provide a substandard implementation that is more effective for
some programs, because identical results can't be expected in substandard mode
(everybody's is slightly different).

Superstandard Underflow

     I propose that the definition for underflow in superstandard mode be that
a small rounded result (magnitude < 1) differ from that obtained by rounding
to the hypothetical format with full normalized significance and unlimited
exponent.  The overflow definition is the same except replace "small rounded
result" by "large rounded result (magnitude > 1)".  The difference is that the
overflow definition is the same as existing IEEE 754 while the underflow
definition is new and does not differ when traps are enabled.

     Consequences include

*    Exceptions are signaled exactly the same on all conforming systems.

*    No more "subnormal detect trap".

*    No more latitude for implementer creativity.

Substandard Underflow

     In substandard mode we can lighten up.  In addition to the cases when
underflow must be signaled in superstandard mode, underflow may also be sig-
naled for any subnormal result and for any normal result obtained by rounding
an exact value smaller than the smallest normalized number.  We can even allow
rounding rules that produce no subnormal results.

Implementing Superstandard Underflow

     The implementation is tedious but tolerable in software compared to other
costs.  The situation of interest is that an unrounded value has an apparent
exponent less than that of the smallest normalized number in the target
storage format.  Underflow is to be signaled except in specific cases when we
shall inhibit the signal.  The test for underflow inhibit amounts to detecting
the cases in which rounding to the storage format would yield the same result
as rounding to the hypothetical format.  For concreteness, let the target
storage format be IEEE 754 single precision.

     In the rounding accumulator, let R represent the 25th bit (round bit) and
S the sticky bit representing the ORing together of all subsequent bits.
These are the bits that would determine the rounded value but for the neces-
sity of producing a subnormal value in the target format.

     Let R1 = R and S1 = S, then R = 0 and S = 0.

     The denormalization count is the difference between the exponent of the
exact value and that of the smallest normalized number in the target format.
Denormalize by shifting according to the count, with bits shifted out first
into the R bit and then sticking in S.  Unlike normal rounding shifting, how-
ever, for this shift it is necessary keep track of whether the bits shifted
out during denormalization are

0    all zeros, (underflow might be inhibited)

1    all ones, (underflow might be inhibited)

*    at least one zero, at least one one  - underflow guaranteed.

     In the last case, stick R1 and S1 into S, signal underflow and inexact,
and round.

     In the case where all ones are shifted out during denormalization, inhi-
bit underflow in RN mode when R1==1 and in RA mode when R1|S1 ==1. Stick R1
and S1 to S and round normally, signaling inexact.

     In the case where all zeros are shifted out during denormalization, inhi-
bit underflow in RN mode when ~R1|~S1 ==1 and in RZ mode always.  Stick R1 and
S1 to S and round normally, signaling inexact unless R|S == 0.

     I believe an analysis of cases will confirm that underflow has been inhi-
bited in exactly the right circumstances (exercise for student).  What can't
be settled so easily is whether it's worth the cost.  As mentioned it's negli-
gible in software and most current high-performance hardware doesn't even try
to denormalize.  If somebody could show convincingly that the cost of inhibit-
ing underflow in these cases is significant in the total cost of an IEEE sys-
tem that handles subnormal operands and results in hardware, then that would
be good reason to change the superstandard underflow definition to: underflow
occurs when an inexact zero or inexact subnormal result is produced.  Thus
underflow may occasionally be signaled in conjunction with a normal rounding
error. (This is one of the allowed definitions for untrapped underflow under
current IEEE 754.)




More information about the Numeric-interest mailing list