[Cfp-interest] exception handling example included this time

Wed May 14 12:56:39 PDT 2014

An interesting statement is
	presubstitute() needs to be faster than a conditional branch in order
to be useful.  Thus it shouldn't be a synchronizing operation that stops
the pipe

On many architectures that really means faster than a floating-point
compare and conditional branch.

One way to beat that would be by the suggestion of having "substitution
registers" containing values to be substituted for the normal results
whenever an exception occurs.  Another would be by having an exception trap
to a handler whenever an exception occurs, with zero cost otherwise.  On
many systems initializing and checking control and status registers is a
synchronizing operation that stops the floating-point pipeline so wouldn't
meet the performance goal.

Exception handling may be faster for other examples, and could also be
useful because it is so much more usable than redesigning algorithms.

- Ian McIntosh          IBM Canada Lab         Compiler Back End Support
and Development

|------------>
| From:      |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |David Hough CFP <pcfp at oakapple.net>                                                                                                        |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |cfp-interest at ucbtest.org,                                                                                                                  |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |2014-05-13 04:35 PM                                                                                                                        |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |[Cfp-interest] exception handling example included this time                                                                               |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |cfp-interest-bounces at oakapple.net                                                                                                          |
  >-------------------------------------------------------------------------------------------------------------------------------------------|

Sorry I forgot to actually include the example last time:

 ==========

 From dgh Tue Oct  8 14:56:16 1991
 To: nceg at cray.com
 Subject: Exception Handling I: The Critical Example

In his proposals for floating-point exception handling, W. Kahan uses the
following example of computing a continued fraction and its derivative.
It's a critical case for exception handling proposals because the desired
exception handling for the 0*inf case changes on each iteration.
Here's the underlying code with no exception handling:

void
continued_fraction(N, a, b, x, pf, pf1)
		 int             N;
		 double         *a, *b, x, *pf, *pf1;
{
		 double          f, f1, d, d1, q;
		 int             j;

		 /*
		  * Evaluate function f at x by continued fraction a[j] and b
[j];
		  * function value to *pf, derivative to *pf1
		  */

/*		 Assume
		 		 aj finite
		 		 bj finite and nonzero
		 		 x finite

		 CRITICAL STEP is evaluating f1= -(d1/d)*q

		 If d == 0 and d1 != 0 on step j, then
		 on the next step j-1 you want f1 = b(j-1)/b(j) * d1(j),
		 a result determined by taking limits as d->0 in step j.
		 Instead by IEEE default,
		 on the next step j-1 you will get f1 = -(inf/inf)*0
		 or NaN.

		 if d == 0 and d1 == 0 on step j, then
		 on the next step j-1 you want f1 = 0 .
		 Instead by IEEE default,
		 on this step j you get f1 = -(0/0)*inf
		 or NaN which pollutes all subsequent f1's.
*/

		 f1 = 0;
		 f = a[N];

		 for (j = N - 1; j >= 0; j--) {
		 		 d = x + f;
		 		 q = b[j] / d;
		 		 f = a[j] + q;
		 		 d1 = 1.0 + f1;
		 		 f1 = -(d1 / d) * q;		 		 /*
CRITICAL STEP */
		 }

		 *pf = f;
		 *pf1 = f1;
		 return;
}

Thus IEEE default results for invalid exceptions
are unhelpful for this particular example, producing NaN's for the
derivative
f1.  The key point is that the correct result depends on what went on
before
in a complicated way.   The straightforward way to code the needed test in
the absence of language support looks like

		 for (j = N - 1; j >= 0; j--) {
		 		 f1j2 = f1j1;		 /* Save this guy for recomputation -
		 		 		 won't be defined until j=N-2 but
that's the
		 		 		 earliest it will be needed.. */
		 		 f1j1 = f1;		 /* Intermediate save. */
		 		 d = x + f;
		 		 q = b[j] / d;
		 		 f = a[j] + q;
		 		 d1 = 1.0 + f1;
		 		 r = d1 / d;		 /* CRITICAL STEP */
		 		 if (r != r) { /* r is NaN from 0/0 or inf/inf */
		 		 		 if (d1 == 0) {
		 		 		 		 /* f1 = (0/0)*inf so
return infinity */
		 		 		 		 f1 = q;
		 		 		 } else {
		 		 		 		 /* f1 = (inf/inf)*0 so
return limit value */
		 		 		 		 f1 = (1.0 + f1j2) * b[j +
2] / b[j + 1];
		 		 		 }
		 		 }
		 		 else { /* r is a normal number */
		 		 		 f1 = -r * q;
		 		 }
		 }

The "normal" case is slowed down by the necessity of retaining f1j2,
the value of f1 from two iterations back, which can't be avoided,
and by having to test
the result of d1/d and branch conditionally.   All the exception-handling
mechanism to be discussed below is intended to avoid that conditional
branch or reduce its cost.

On non-IEEE systems on which division by zero was fatal or infinity or NaN
could
not be distinguished from finite values, producing robust results for this
computation would be much more expensive.  As they also would be on IEEE
systems
which stubbornly optimized away if (r!=r), which could also be expressed
as isnan(r) or unordered(r,r) at somewhat higher cost.
Note that there is nothing
truly exception or abnormal about the case d == 0; all that's required is
that
the value of the input parameter x happen to be the negative of one of the
partial function values f, which is mostly a matter of luck.

 ==========

 From dgh Tue Oct  8 15:36:13 1991
 To: nceg at cray.com
 Subject: Exception Handling II: Presubstitution

The previous message demonstrated a loop to evaluate a continued fraction
and its derivative, and showed how points where the formula for the
derivative breaks down can be bypassed by conventional means, at least in
IEEE arithmetic.

The next step is to consider Kahan's proposal for presubstitution, which
turns
the loop in question into the form:

extern double   infinity(void);
extern void     presubstitute(char *, double);

void
continued_fraction(N, a, b, x, pf, pf1)
		 int             N;
		 double         *a, *b, x, *pf, *pf1;
{
		 double          f, f1, d, d1, q, bj, bj1, p;
		 int             j;

		 /*
		  * Evaluate function f at x by continued fraction a[j] and b
[j];
		  * function value to *pf, derivative to *pf1
		  */

/*		 Assume
		 		 aj finite
		 		 bj finite and nonzero
		 		 x finite

		 Critical step is evaluating f1= -(d1/d)*q

		 If d == 0 and d1 != 0 on step j, then
		 on the next step j-1 you want f1 = p = b(j-1)/b(j) * d1(j),
		 a result determined by taking limits in step j.
		 Since f1(j-1) is nominally computed as (inf/inf)*0,
		 the desired result can be obtained by substituting
		 inf for inf/inf and p for inf * 0.

		 if d == 0 and d1 == 0 on step j, then
		 on the next step j-1 you want f1 = p = 0 .
		 Since f1(j) is nominally computed as (0/0)*inf, and
		 since f1(j-1) is nominally computed as (f1(j)/inf)*0,
		 the desired result can be obtained by substituting
		 inf for 0/0 and p (= 0) for inf * 0.
*/

		 presubstitute("0/0", infinity());
		 presubstitute("inf/inf", infinity());

		 f1 = 0;
		 f = a[N];
		 bj1 = b[N];

		 for (j = N - 1; j >= 0; j--) {
		 		 d = x + f;
		 		 d1 = 1.0 + f1;
		 		 bj=b[j];
		 		 q = bj / d;
		 		 f1 = -(d1 / d) * q;		 		 /*
CRITICAL STEP */
		 		 f = a[j] + q;
		 		 p = bj1 * d1 / bj;
		 		 presubstitute("0*inf", p);
		 		 bj1 = bj;
		 }

		 *pf = f;
		 *pf1 = f1;
		 return;
}

With j counting downward, bj1 means b[j+1],
the b[j] that was in effect during the
previous iteration.

Note that there is an error in Kahan's 17 March 1987 draft of
"Presubstitution, and Continued Fractions" -
the presubstitution value should be written

		 b   * d' / b
		  j+1        j

The first subscript appears uniformly in the draft as j-1,
perhaps a relic of an earlier draft in which the loop ran from j=0 to
j=N-1.

In contrast to the "conventional method" proposed previously, which
required
allocating two registers to store previous-iteration data and more
importantly
a conditional branch to test the result of d1/d on every iteration,
the presubstitution method requires setting up two presubstitutions outside
the loop to cause 0/0 and inf/inf to be replaced by inf,
and another within the loop that causes 0*inf to be replaced by p, which
must be computed on each iteration and inserted into the arithmetic pipe
in case 0*inf arises on the next iteration.
Thus updating the presubstitution value must be
a fast operation, even though actually doing the presubstitution in the
relatively rare event of an exception might be slow.

There are a couple of things to note about the function presubstitute():

1) 		 If implemented as hardware fpop codes, presubstitute()
needs to be faster than a
conditional branch in order to be useful.  Thus it shouldn't be a
synchronizing
operation that stops the pipe; updates to the presubstituted values have
to flow through the pipe with other fpops.

		 Furthermore there must be some presubstitution registers, at
least
one each for 0/0, inf/inf, 0*inf, other invalid, finite/0, overflow, and
underflow; on a V8 SPARC, for instance, presubstitution registers must be
able to accommodate 128-bit long double data types, so there are already at
least 7 * 4 = 28 32-bit words of presubstitution registers, compared to
the 32 32-bit floating-point data registers used in normal operation;
all must be saved and restored during context switches.   I'd like to hear
from hardware designers about the cost of this approach.

		 Alternatively presubstitution registers could be implemented
as 5-bit pointers to the 32 floating-point data registers.
Then data registers used for presubstitution values
(4 32-bit registers in this example) would be subtracted from the pool of
conventional
registers available for optimizer allocation.   Some compiler cooperation
would
be required to understand this - presubstitution would be an operation
known
to compilers.   Hardware paths would have to be available
to bring presubstitution values from the appropriate data registers to the
destination register when needed.   This operation (inserting the
presubstituted
value) could be slow because it's rare, unlike setting up the
presubstituted
value, which must be fast.

		 On vector machines, a vector version of the continued fraction
program
might have x a vector parameter rather than a scalar.
Presubstitution registers would have to be the
same size as vector registers.   Or else this kind of code simply wouldn't
be vectorized.

2)		 Presubstitute() can't be implemented as a system call.  That
would
take longer than the conditional compare.

3)		 Presubstitute() could be implemented simply by storing
presubstituted
values into a normal array in user address space memory.  Programs planning
to use presubstitution first register their intent at program
initialization
by a system call that tells the kernel where to find this array of
presubstitution values in user space.   Thus the array can be updated
relatively
quickly - it's just a (probably cached) write.
Performing the substitution when an
exception arises requires that the floating-point hardware be trappable
and continuable after traps.   The system call that initializes
presubstitution
would also initialize the hardware in this case to trap on every invalid
operation.    The kernel trap handler would examine each such trapping fpop
and obtain the value to be used from the user-space table.   It is not
necessary for this purpose that user-mode code handle the trap, although
that's one possible implementation.   Hopefully the user-mode code won't
actually be written by the end user.

		 This appears to be most likely way of implementing
presubstitution,
should it prove worthwhile.

 ==========

 From dgh Tue Oct  8 16:46:50 1991
 To: nceg at cray.com
 Subject: Exception Handling III: Operator Modifiers

In my first posting, I described a loop to evaluate a continued fraction
and its derivative and how it might be made robust in the face of 0/0 or
inf/inf by conventional means costing a conditional branch on every
iteration.

In my second posting, I described how presubstitution might work.  In the
best
possible implementation, an extra fp multiply, divide, and "presubstitute"
are required on every iteration.   The "presubstitute" might be a load to
a hardware presubstitution register or a store to a user-space memory
location.

I now turn to another approach which is also based on conditional branches
-
and thus is probably no cheaper than the conventional approach in this
example.   The payoff is in other situations where the exceptionalness
can't be so readily determined by doing just one floating-point comparison,
such as when an inexact must be detected.

The idea is that instead of testing the numerical operands or result of an
operation, or depending on hardware presubstitution registers or recovering
from an asynchronous trap, the code tests the accrued IEEE exception bits.
It's intended to permit an important optimization on IEEE systems that
maintain separate "current exception bits" for the last fpop: if there is
only one fpop that can generate the exception of interest, then only the
current exception bits for that fpop need be tested.  Abstractly this is
not
inherently more or less synchronizing than a conditional branch on
just-computed floating-point data, although current implementations might
do a better job on the latter.

Conventionally, if you want to test an fp expression for a particular IEEE
exception such as FE_INVALID, you would do something like

		 tinv = get_accrued(FE_INVALID);
		 clear_accrued(FE_INVALID);
		 dest = expression;
		 tinv2 = get_accrued(FE_INVALID);
		 set_accrued(FE_INVALID, tinv | tinv2);
		 if (tinv2 == 0) {
		 		 /* code for normal case that no exception arises
*/
		 } else {
		 		 /* code for unusual case that exception arises */
		 }

So that's at least four function calls, worst case, resulting best case in
two stores and two loads of the accrued exception register; plus a
conditional
branch.   On most current
high-performance IEEE systems those loads, stores, and branch will be much
more
expensive than the "dest=expression" that one would most typically want to
test.

The proposal below is for language support for the foregoing construct,
with
the intent that it would often be optimized to

		 dest = expression;
		 if (get_current(FE_INVALID) == 0) ... else ...

and that consequently there would in time be hardware support for either
the
five instructions

		 "branch conditionally if the previous fpop generated FE_X"

for each IEEE exception INVALID, OVERFLOW, etc., or else one instruction

		 "branch conditionally if the previous fpop generated FE_X|
FE_Y|..."

where any subset of the five IEEE exceptions can be OR'd together
to specify the desired condition; and thus after further time fpop
implementations might send early information to branch units that certain
IEEE exceptions are guaranteed not to arise on this fpop code - a MIPSy
idea
much appreciated by Kahan - releasing control flow past the conditional
branch.

In my experience the most common case, by far, is that only one operation
matters, so my original idea was to attach the branch condition as a
modifier
directly to the operation:

		 z = x / y ;

becoming

		 z = x / [on FE_INVALID goto label] y ;

indicating that the fpop(s) implementing / be tested for invalid and when
encountered, control would flow to a local labeled statement.
A goto was chosen with malice aforethought to insure a lightweight
implementation burden: if the exception arises the compiled code is neither

required to have assigned to z nor to have left z untouched.
There is no possible way to return back into the middle of the expression.

A semantically similar but syntactically quite different approach, inspired
by
Stallman, is to attach a cast-like modifier to an expression:

		 (__oninvalid__ label) (x/y)

or maybe even

		 (__oninvalid__ label) (z=x/y)

The distinction between these two
is important on extended-precision-based systems that
typically won't detect overflow or underflow on the divide but only on the
assignment forcing conversion to storage precision.

Anyway if the tested code is an expression rather than a single operation,
compilers should eventually recognize possibilities for optimization.

An even more general approach is inspired by C++:

		 try { z=x/y ; }
		 catch (FE_INVALID) { /* code for exceptional case */ } ;
		 /* followed by code for non-exceptional case
		    that will also be executed if exceptional case falls
through */

There can be multiple catch clauses for different exceptions.

I don't think the C++ try/catch mechanism is appropriate for C - too
heavyweight: it allows for user-defined exceptions and requires dynamic
scoping of handlers: if FE_OVERFLOW is not caught here, is should be caught
by any dynamically enclosing catch even in some other procedure.   This is
too much like a long jump - costly and not helpful for what I want to do,
which is very local.

Worse, the main purported advantage -
a cleaner syntax devoid of gratuitous gotos and labels -
will frequently prove illusory, since it will frequently be the case
that exceptional-case code should not fall through to normal-case code as
it would appear to do in the example above.

Thus a truly satisfactory syntax awaits discovery and promulgation.
I'm eager for good suggestions.

I'll close by illustrating the foregoing with the continued fraction
example
from the previous postings.   First with the operator or expression
modifier syntax:

void
continued_fraction(N, a, b, x, pf, pf1)
		 int             N;
		 double         *a, *b, x, *pf, *pf1;
{
		 double          f, f1, d, d1, q, r, f1j1, f1j2;
		 int             j;

		 /*
		  * Evaluate function f at x by continued fraction a[j] and b
[j];
		  * function value to *pf, derivative to *pf1
		  */

		 /*
		  * Assume aj finite bj finite and nonzero x finite
		  *
		  * Critical step is evaluating f1= -(d1/d)*q
		  *
		  * If d == 0 and d1 != 0 on step j, then on the next step j-1
you want
		  * f1 = p = b(j-1)/b(j) * d1(j), a result determined by taking
limits
		  * in step j. Since f1(j-1) is nominally computed as
(inf/inf)*0, the
		  * desired result can be obtained by substituting inf for
inf/inf and
		  * p for inf * 0.
		  *
		  * if d == 0 and d1 == 0 on step j, then on the next step j-1
you want
		  * f1 = p = 0 . Since f1(j) is nominally computed as
(0/0)*inf, and
		  * since f1(j-1) is nominally computed as (f1(j)/inf)*0, the
desired
		  * result can be obtained by substituting inf for 0/0 and p (=
0) for
		  * inf * 0.
		  */

		 f1 = 0;
		 f = a[N];

		 for (j = N - 1; j >= 0; j--) {
		 		 f1j2 = f1j1;		 /* Save this guy for recomputation -
		 		 		 won't be defined until j=N-2 but
that's the
		 		 		 earliest it will be needed.. */
		 		 f1j1 = f1;		 /* Intermediate save. */
		 		 d = x + f;
		 		 q = b[j] / d;
		 		 f = a[j] + q;
		 		 d1 = 1.0 + f1;
#ifdef operatorsyntax
		 		 r = d1 / [on FE_INVALID goto invdiv] d;
#else		 		 		 		 /* cast expression syntax
*/
		 		 r = (__oninvalid__ invdiv) (d1 / d);
#endif
		 		 f1 = -r * q;		 /* non-exceptional case */

endofloop:		 {		 		 /* some kind of null statement
required
		 		 		 		    syntactically here */
		 		 }
		 }

		 *pf = f;
		 *pf1 = f1;
		 return;

invdiv: /* invalid division 0/0 or inf/inf detected */

		 if (d1 == 0) {
		 		 /* f1 = (0/0)*inf so return infinity */
		 		 f1 = q;
		 } else {
		 		 /* f1 = (inf/inf)*0 so return "p" */
		 		 f1 = (1.0 + f1j2) * b[j + 2] / b[j + 1];
		 }
		 goto endofloop;
}

The corresponding try/catch looks a little nicer:

void
continued_fraction(N, a, b, x, pf, pf1)
		 int             N;
		 double         *a, *b, x, *pf, *pf1;
{
		 double          f, f1, d, d1, q, r, f1j1, f1j2;
		 int             j;

		 /*
		  * Evaluate function f at x by continued fraction a[j] and b
[j];
		  * function value to *pf, derivative to *pf1
		  */

		 /*
		  * Assume aj finite bj finite and nonzero x finite
		  *
		  * Critical step is evaluating f1= -(d1/d)*q
		  *
		  * If d == 0 and d1 != 0 on step j, then on the next step j-1
you want
		  * f1 = p = b(j-1)/b(j) * d1(j), a result determined by taking
limits
		  * in step j. Since f1(j-1) is nominally computed as
(inf/inf)*0, the
		  * desired result can be obtained by substituting inf for
inf/inf and
		  * p for inf * 0.
		  *
		  * if d == 0 and d1 == 0 on step j, then on the next step j-1
you want
		  * f1 = p = 0 . Since f1(j) is nominally computed as
(0/0)*inf, and
		  * since f1(j-1) is nominally computed as (f1(j)/inf)*0, the
desired
		  * result can be obtained by substituting inf for 0/0 and p (=
0) for
		  * inf * 0.
		  */

		 f1 = 0;
		 f = a[N];

		 for (j = N - 1; j >= 0; j--) {
		 		 f1j2 = f1j1;		 		 /* Save this
guy for recomputation -
		 		 		 		 won't be defined until
j=N-2 but that's the
		 		 		 		 earliest it will be
needed.. */
		 		 f1j1 = f1;		 		 /* Intermediate
save. */
		 		 d = x + f;
		 		 q = b[j] / d;
		 		 f = a[j] + q;
		 		 d1 = 1.0 + f1;
		 		 try { r = d1/d ; }
		 		 catch (FE_INVALID) { 		 /* invalid division 0/0 or
inf/inf
		 		 		 		 		    detected */
		 		 		 if (d1 == 0) {		 /* f1 = (0/0)*inf so
return infinity */
		 		 		 		 f1 = q;
		 		 		 } else {		 /* f1 = (inf/inf)*0 so return
"p" */
		 		 		 		 f1 = (1.0 + f1j2) * b[j +
2] / b[j + 1];
		 		 		 }
		 		 		 goto endofloop;
		 		 };

		 		 f1 = -r * q;

endofloop:		 {		 		 /* some kind of statement
required here */
		 		 }
		 }

		 *pf = f;
		 *pf1 = f1;
		 return;
}

By the way, figuring out that the important exceptional cases are
(0/0)*inf and (inf/inf)*0,  given the input constraints on x, a, and b,
is an interesting exercise.

 ==========

 From dgh Tue Oct  8 17:39:45 1991
 To: nceg at cray.com
 Subject: Exception Handling IV:  try/catch a switch?

try/catch is a sort of like a switch.  Maybe if it were more so, the
problems with labels I mentioned previously would go away, e.g. a
somewhat different syntax for the example I have been using:

        __eswitch__ (r = d1/d) {
        case FE_INVALID: /* invalid division 0/0 or inf/inf detected */
                if (d1 == 0) {  /* f1 = (0/0)*inf so return infinity */
                        f1 = q;
                } else {        /* f1 = (inf/inf)*0 so return "p" */
                        f1 = (1.0 + f1j2) * b[j + 2] / b[j + 1];
                }
                break;          /* instead of goto... */
        default:        /* unexceptional case */
                f1 = -r * q;
                break;
        }

Maybe that's an improvement:  an __eswitch__ is like a switch, except the
cases are cases of exceptions rather than cases of expression values.

_______________________________________________
Cfp-interest mailing list
Cfp-interest at oakapple.net
http://mailman.oakapple.net/mailman/listinfo/cfp-interest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20140514/828e3fc8/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20140514/828e3fc8/attachment-0002.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
Url : http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20140514/828e3fc8/attachment-0003.gif