more on alternatives to continuable traps and presubstitution
David Hough
sun!Eng!David.Hough
Mon May 6 09:56:29 PDT 1991
The whole idea of continuable traps and presubstitution is to make it
possible to handle abnormal cases flexibly without explicitly
slowing down normal cases by inserting explicit conditional branches.
Of course from a RISC viewpoint, the extra machinery necessary to support
either continuable traps or presubstitution may slow down the normal
case indirectly.
The VAXCENTRIC mode facilitates certain common presubstitutions
without requiring any explicit conditional branches, while the INHIBITED
mode still requires the explicit conditional branches; WRAP could be done
either way. How much do
WRAP and INHIBITED differ from what is already almost available,
To the extent that simple explicit conditional branches are deemed an
acceptable price to pay?
In typical implementations you could do something like
c outside the loop
save=ieee_flags("get","exception","all")
c in the loop
ieee_flags("clear","exception")
z(i) = x(i) * y(i)
newflags=ieee_flags("get","exception")
if (underflow in newflags) goto 1
if (overflow in newflags) goto 2
c outside the loop
ieee_flags("set","exception","all",save)
The cost of the multiply is dwarfed by all the bit twiddling going on here.
As one step toward better efficiency, you
can imagine a generalization of the suggestions I made
whereby each IEEE 754 operator has potentially attached to it a list of
conditions and labels:
z(i) = x(i) *!!underflow->1,overflow->2!! y(i)
redirecting execution according to the current exception bits
(e.g. SPARC cexc), implemented in hardware
with corresponding conditional branch instructions
fbnvc addr
fbovc addr
fbufc addr
fbdzc addr
fbnxc addr
(There are some code generation
problems with higher-precision expression evaluation
where invalid, division by zero, and inexact exceptions
would usually arise on the fpop while overflow and underflow would usually
arise on a subsequent conversion to storage precision, which is one
reason why IEEE 754 standardized on the accrued exception flags rather than
the current exception flags.)
How do my previous proposals differ from what's
in the foregoing paragraph? WRAP mode redefines the result z(i) from
the normal zero, subnormal, or infinite result,
to a wrapped form containing full precision,
while INHIBITED mode avoids storing z(i) since it may be aliased to
an operand x(i) or y(i) that's needed to construct the desired result.
Before accepting the cost of adding another mode bit per exception, we
need to insure that these modes will pay their way. (VAXCENTRIC is
easier to justify since normal storage of results is not interrupted).
With respect to INHIBITED mode, the question is how often one would
want to continue with a result that was computed from the operands
of the operation, such as sqrt(x) -> -sqrt(-x), vs. how often one would
longjump or continue with a result that is independent of the operands,
such as sqrt(x) -> 0. The latter case doesn't need INHIBITED mode.
With respect to WRAP mode, the question is how often one would really
want to solve the problem of exponent spill entirely within a loop,
as opposed to recomputing the entire computation more carefully in the unusual
event of a spill:
save=ieee_flags("get","exception","all"
ieee_flags("clear","exception")
p=x(1)
do 1 i = 2,n
p = p * x(i)
1 continue
newflags=ieee_flags("get","exception","all")
if (overflow or underflow in newflags)
ieee_flags("set","exception","all",save)
c careful code for exponent spill with separate exponent
c and significand
p="frexp"(x(1),&expo)
do 3 i = 2,n
p = p * "frexp"(x(i),&e)
expo = expo + e
3 continue
p = "ldexp"(p,e)
else
ieee_flags("set","exception","all",save OR newflags)
end if
Since WRAP mode can't be used to make existing code run better without
rewriting, how much worse is a little more rewriting? The impact of
standardizing "ieee_flags", "frexp", and "ldexp" is a lot less than
standardizing the machinery for WRAP mode.
More information about the Numeric-interest
mailing list