[Cfp-interest] exception handling example included this time

Tue May 13 13:35:12 PDT 2014

Sorry I forgot to actually include the example last time:

 ==========

 From dgh Tue Oct  8 14:56:16 1991
 To: nceg at cray.com
 Subject: Exception Handling I: The Critical Example

In his proposals for floating-point exception handling, W. Kahan uses the
following example of computing a continued fraction and its derivative.
It's a critical case for exception handling proposals because the desired
exception handling for the 0*inf case changes on each iteration.
Here's the underlying code with no exception handling:

void
continued_fraction(N, a, b, x, pf, pf1)
	int             N;
	double         *a, *b, x, *pf, *pf1;
{
	double          f, f1, d, d1, q;
	int             j;

	/*
	 * Evaluate function f at x by continued fraction a[j] and b[j];
	 * function value to *pf, derivative to *pf1
	 */

/*	Assume
		aj finite
		bj finite and nonzero
		x finite

	CRITICAL STEP is evaluating f1= -(d1/d)*q

	If d == 0 and d1 != 0 on step j, then
	on the next step j-1 you want f1 = b(j-1)/b(j) * d1(j),
	a result determined by taking limits as d->0 in step j.
	Instead by IEEE default, 
	on the next step j-1 you will get f1 = -(inf/inf)*0
	or NaN.

	if d == 0 and d1 == 0 on step j, then 
	on the next step j-1 you want f1 = 0 .
	Instead by IEEE default, 
	on this step j you get f1 = -(0/0)*inf
	or NaN which pollutes all subsequent f1's.
*/

	f1 = 0;
	f = a[N];

	for (j = N - 1; j >= 0; j--) {
		d = x + f;
		q = b[j] / d;
		f = a[j] + q;
		d1 = 1.0 + f1;
		f1 = -(d1 / d) * q;		/* CRITICAL STEP */
	}

	*pf = f;
	*pf1 = f1;
	return;
}

Thus IEEE default results for invalid exceptions
are unhelpful for this particular example, producing NaN's for the derivative
f1.  The key point is that the correct result depends on what went on before
in a complicated way.   The straightforward way to code the needed test in
the absence of language support looks like

	for (j = N - 1; j >= 0; j--) {
		f1j2 = f1j1;	/* Save this guy for recomputation -
			won't be defined until j=N-2 but that's the
			earliest it will be needed.. */
		f1j1 = f1;	/* Intermediate save. */
		d = x + f;
		q = b[j] / d;
		f = a[j] + q;
		d1 = 1.0 + f1;
		r = d1 / d;	/* CRITICAL STEP */
		if (r != r) { /* r is NaN from 0/0 or inf/inf */
			if (d1 == 0) {
				/* f1 = (0/0)*inf so return infinity */
				f1 = q;
			} else {
				/* f1 = (inf/inf)*0 so return limit value */
				f1 = (1.0 + f1j2) * b[j + 2] / b[j + 1];
			}
		}
		else { /* r is a normal number */
			f1 = -r * q;
		}	
	}

The "normal" case is slowed down by the necessity of retaining f1j2,
the value of f1 from two iterations back, which can't be avoided, 
and by having to test
the result of d1/d and branch conditionally.   All the exception-handling
mechanism to be discussed below is intended to avoid that conditional
branch or reduce its cost.  

On non-IEEE systems on which division by zero was fatal or infinity or NaN could
not be distinguished from finite values, producing robust results for this
computation would be much more expensive.  As they also would be on IEEE systems
which stubbornly optimized away if (r!=r), which could also be expressed
as isnan(r) or unordered(r,r) at somewhat higher cost.   
Note that there is nothing
truly exception or abnormal about the case d == 0; all that's required is that
the value of the input parameter x happen to be the negative of one of the
partial function values f, which is mostly a matter of luck.

 ==========

 From dgh Tue Oct  8 15:36:13 1991
 To: nceg at cray.com
 Subject: Exception Handling II: Presubstitution

The previous message demonstrated a loop to evaluate a continued fraction
and its derivative, and showed how points where the formula for the
derivative breaks down can be bypassed by conventional means, at least in
IEEE arithmetic.

The next step is to consider Kahan's proposal for presubstitution, which turns
the loop in question into the form:

extern double   infinity(void);
extern void     presubstitute(char *, double);

void
continued_fraction(N, a, b, x, pf, pf1)
	int             N;
	double         *a, *b, x, *pf, *pf1;
{
	double          f, f1, d, d1, q, bj, bj1, p;
	int             j;

	/*
	 * Evaluate function f at x by continued fraction a[j] and b[j];
	 * function value to *pf, derivative to *pf1
	 */

/*	Assume
		aj finite
		bj finite and nonzero
		x finite

	Critical step is evaluating f1= -(d1/d)*q

	If d == 0 and d1 != 0 on step j, then
	on the next step j-1 you want f1 = p = b(j-1)/b(j) * d1(j),
	a result determined by taking limits in step j.
	Since f1(j-1) is nominally computed as (inf/inf)*0,
	the desired result can be obtained by substituting 
	inf for inf/inf and p for inf * 0.

	if d == 0 and d1 == 0 on step j, then 
	on the next step j-1 you want f1 = p = 0 .
	Since f1(j) is nominally computed as (0/0)*inf, and
	since f1(j-1) is nominally computed as (f1(j)/inf)*0,
	the desired result can be obtained by substituting
	inf for 0/0 and p (= 0) for inf * 0.
*/

	presubstitute("0/0", infinity());
	presubstitute("inf/inf", infinity());

	f1 = 0;
	f = a[N];
	bj1 = b[N];	

	for (j = N - 1; j >= 0; j--) {
		d = x + f;
		d1 = 1.0 + f1;
		bj=b[j];
		q = bj / d;
		f1 = -(d1 / d) * q;		/* CRITICAL STEP */
		f = a[j] + q;
		p = bj1 * d1 / bj;
		presubstitute("0*inf", p);
		bj1 = bj;
	}

	*pf = f;
	*pf1 = f1;
	return;
}

With j counting downward, bj1 means b[j+1],
the b[j] that was in effect during the
previous iteration.

Note that there is an error in Kahan's 17 March 1987 draft of
"Presubstitution, and Continued Fractions" -
the presubstitution value should be written

	b   * d' / b
	 j+1        j

The first subscript appears uniformly in the draft as j-1,
perhaps a relic of an earlier draft in which the loop ran from j=0 to j=N-1.

In contrast to the "conventional method" proposed previously, which required
allocating two registers to store previous-iteration data and more importantly
a conditional branch to test the result of d1/d on every iteration,
the presubstitution method requires setting up two presubstitutions outside
the loop to cause 0/0 and inf/inf to be replaced by inf, 
and another within the loop that causes 0*inf to be replaced by p, which
must be computed on each iteration and inserted into the arithmetic pipe
in case 0*inf arises on the next iteration.    
Thus updating the presubstitution value must be
a fast operation, even though actually doing the presubstitution in the
relatively rare event of an exception might be slow.

There are a couple of things to note about the function presubstitute():

1) 	If implemented as hardware fpop codes, presubstitute()
needs to be faster than a
conditional branch in order to be useful.  Thus it shouldn't be a synchronizing
operation that stops the pipe; updates to the presubstituted values have
to flow through the pipe with other fpops.

	Furthermore there must be some presubstitution registers, at least
one each for 0/0, inf/inf, 0*inf, other invalid, finite/0, overflow, and
underflow; on a V8 SPARC, for instance, presubstitution registers must be
able to accommodate 128-bit long double data types, so there are already at
least 7 * 4 = 28 32-bit words of presubstitution registers, compared to
the 32 32-bit floating-point data registers used in normal operation;
all must be saved and restored during context switches.   I'd like to hear
from hardware designers about the cost of this approach.

	Alternatively presubstitution registers could be implemented
as 5-bit pointers to the 32 floating-point data registers.  
Then data registers used for presubstitution values
(4 32-bit registers in this example) would be subtracted from the pool of 
conventional
registers available for optimizer allocation.   Some compiler cooperation would
be required to understand this - presubstitution would be an operation known
to compilers.   Hardware paths would have to be available
to bring presubstitution values from the appropriate data registers to the
destination register when needed.   This operation (inserting the presubstituted
value) could be slow because it's rare, unlike setting up the presubstituted 
value, which must be fast.

	On vector machines, a vector version of the continued fraction program
might have x a vector parameter rather than a scalar.
Presubstitution registers would have to be the
same size as vector registers.   Or else this kind of code simply wouldn't
be vectorized.

2)	Presubstitute() can't be implemented as a system call.  That would
take longer than the conditional compare.

3)	Presubstitute() could be implemented simply by storing presubstituted
values into a normal array in user address space memory.  Programs planning
to use presubstitution first register their intent at program initialization
by a system call that tells the kernel where to find this array of 
presubstitution values in user space.   Thus the array can be updated relatively
quickly - it's just a (probably cached) write.  
Performing the substitution when an
exception arises requires that the floating-point hardware be trappable
and continuable after traps.   The system call that initializes presubstitution
would also initialize the hardware in this case to trap on every invalid
operation.    The kernel trap handler would examine each such trapping fpop
and obtain the value to be used from the user-space table.   It is not
necessary for this purpose that user-mode code handle the trap, although
that's one possible implementation.   Hopefully the user-mode code won't
actually be written by the end user.

	This appears to be most likely way of implementing presubstitution,
should it prove worthwhile.

 ==========

 From dgh Tue Oct  8 16:46:50 1991
 To: nceg at cray.com
 Subject: Exception Handling III: Operator Modifiers

In my first posting, I described a loop to evaluate a continued fraction
and its derivative and how it might be made robust in the face of 0/0 or
inf/inf by conventional means costing a conditional branch on every iteration.

In my second posting, I described how presubstitution might work.  In the best
possible implementation, an extra fp multiply, divide, and "presubstitute"
are required on every iteration.   The "presubstitute" might be a load to
a hardware presubstitution register or a store to a user-space memory location.

I now turn to another approach which is also based on conditional branches -
and thus is probably no cheaper than the conventional approach in this
example.   The payoff is in other situations where the exceptionalness
can't be so readily determined by doing just one floating-point comparison,
such as when an inexact must be detected.

The idea is that instead of testing the numerical operands or result of an
operation, or depending on hardware presubstitution registers or recovering
from an asynchronous trap, the code tests the accrued IEEE exception bits.
It's intended to permit an important optimization on IEEE systems that
maintain separate "current exception bits" for the last fpop: if there is
only one fpop that can generate the exception of interest, then only the
current exception bits for that fpop need be tested.  Abstractly this is not
inherently more or less synchronizing than a conditional branch on 
just-computed floating-point data, although current implementations might
do a better job on the latter.

Conventionally, if you want to test an fp expression for a particular IEEE
exception such as FE_INVALID, you would do something like

	tinv = get_accrued(FE_INVALID);
	clear_accrued(FE_INVALID);
	dest = expression;
	tinv2 = get_accrued(FE_INVALID);
	set_accrued(FE_INVALID, tinv | tinv2);
	if (tinv2 == 0) {
		/* code for normal case that no exception arises */
	} else {
		/* code for unusual case that exception arises */
	}

So that's at least four function calls, worst case, resulting best case in
two stores and two loads of the accrued exception register; plus a conditional
branch.   On most current
high-performance IEEE systems those loads, stores, and branch will be much more
expensive than the "dest=expression" that one would most typically want to
test.

The proposal below is for language support for the foregoing construct, with
the intent that it would often be optimized to 

	dest = expression;
	if (get_current(FE_INVALID) == 0) ... else ...

and that consequently there would in time be hardware support for either the
five instructions

	"branch conditionally if the previous fpop generated FE_X"

for each IEEE exception INVALID, OVERFLOW, etc., or else one instruction

	"branch conditionally if the previous fpop generated FE_X|FE_Y|..."

where any subset of the five IEEE exceptions can be OR'd together
to specify the desired condition; and thus after further time fpop
implementations might send early information to branch units that certain
IEEE exceptions are guaranteed not to arise on this fpop code - a MIPSy idea
much appreciated by Kahan - releasing control flow past the conditional 
branch.

In my experience the most common case, by far, is that only one operation
matters, so my original idea was to attach the branch condition as a modifier
directly to the operation:

	z = x / y ;

becoming

	z = x / [on FE_INVALID goto label] y ;

indicating that the fpop(s) implementing / be tested for invalid and when
encountered, control would flow to a local labeled statement.   
A goto was chosen with malice aforethought to insure a lightweight
implementation burden: if the exception arises the compiled code is neither 
required to have assigned to z nor to have left z untouched.
There is no possible way to return back into the middle of the expression.

A semantically similar but syntactically quite different approach, inspired by
Stallman, is to attach a cast-like modifier to an expression:

	(__oninvalid__ label) (x/y)

or maybe even

	(__oninvalid__ label) (z=x/y) 

The distinction between these two
is important on extended-precision-based systems that
typically won't detect overflow or underflow on the divide but only on the
assignment forcing conversion to storage precision.

Anyway if the tested code is an expression rather than a single operation,
compilers should eventually recognize possibilities for optimization.

An even more general approach is inspired by C++:

	try { z=x/y ; }
	catch (FE_INVALID) { /* code for exceptional case */ } ;
	/* followed by code for non-exceptional case 
	   that will also be executed if exceptional case falls through */

There can be multiple catch clauses for different exceptions.

I don't think the C++ try/catch mechanism is appropriate for C - too
heavyweight: it allows for user-defined exceptions and requires dynamic
scoping of handlers: if FE_OVERFLOW is not caught here, is should be caught
by any dynamically enclosing catch even in some other procedure.   This is
too much like a long jump - costly and not helpful for what I want to do,
which is very local.

Worse, the main purported advantage - 
a cleaner syntax devoid of gratuitous gotos and labels -
will frequently prove illusory, since it will frequently be the case
that exceptional-case code should not fall through to normal-case code as
it would appear to do in the example above.

Thus a truly satisfactory syntax awaits discovery and promulgation.
I'm eager for good suggestions.

I'll close by illustrating the foregoing with the continued fraction example
from the previous postings.   First with the operator or expression
modifier syntax:

void
continued_fraction(N, a, b, x, pf, pf1)
	int             N;
	double         *a, *b, x, *pf, *pf1;
{
	double          f, f1, d, d1, q, r, f1j1, f1j2;
	int             j;

	/*
	 * Evaluate function f at x by continued fraction a[j] and b[j];
	 * function value to *pf, derivative to *pf1
	 */

	/*
	 * Assume aj finite bj finite and nonzero x finite
	 * 
	 * Critical step is evaluating f1= -(d1/d)*q
	 * 
	 * If d == 0 and d1 != 0 on step j, then on the next step j-1 you want
	 * f1 = p = b(j-1)/b(j) * d1(j), a result determined by taking limits
	 * in step j. Since f1(j-1) is nominally computed as (inf/inf)*0, the
	 * desired result can be obtained by substituting inf for inf/inf and
	 * p for inf * 0.
	 * 
	 * if d == 0 and d1 == 0 on step j, then on the next step j-1 you want
	 * f1 = p = 0 . Since f1(j) is nominally computed as (0/0)*inf, and
	 * since f1(j-1) is nominally computed as (f1(j)/inf)*0, the desired
	 * result can be obtained by substituting inf for 0/0 and p (= 0) for
	 * inf * 0.
	 */

	f1 = 0;
	f = a[N];

	for (j = N - 1; j >= 0; j--) {
		f1j2 = f1j1;	/* Save this guy for recomputation -
			won't be defined until j=N-2 but that's the
			earliest it will be needed.. */
		f1j1 = f1;	/* Intermediate save. */
		d = x + f;
		q = b[j] / d;
		f = a[j] + q;
		d1 = 1.0 + f1;
#ifdef operatorsyntax
		r = d1 / [on FE_INVALID goto invdiv] d;
#else				/* cast expression syntax */
		r = (__oninvalid__ invdiv) (d1 / d);
#endif
		f1 = -r * q;	/* non-exceptional case */

endofloop:	{		/* some kind of null statement required 
				   syntactically here */
		}
	}

	*pf = f;
	*pf1 = f1;
	return;

invdiv: /* invalid division 0/0 or inf/inf detected */

	if (d1 == 0) {
		/* f1 = (0/0)*inf so return infinity */
		f1 = q;
	} else {
		/* f1 = (inf/inf)*0 so return "p" */
		f1 = (1.0 + f1j2) * b[j + 2] / b[j + 1];
	}
	goto endofloop;
}

The corresponding try/catch looks a little nicer:

void
continued_fraction(N, a, b, x, pf, pf1)
	int             N;
	double         *a, *b, x, *pf, *pf1;
{
	double          f, f1, d, d1, q, r, f1j1, f1j2;
	int             j;

	/*
	 * Evaluate function f at x by continued fraction a[j] and b[j];
	 * function value to *pf, derivative to *pf1
	 */

	/*
	 * Assume aj finite bj finite and nonzero x finite
	 * 
	 * Critical step is evaluating f1= -(d1/d)*q
	 * 
	 * If d == 0 and d1 != 0 on step j, then on the next step j-1 you want
	 * f1 = p = b(j-1)/b(j) * d1(j), a result determined by taking limits
	 * in step j. Since f1(j-1) is nominally computed as (inf/inf)*0, the
	 * desired result can be obtained by substituting inf for inf/inf and
	 * p for inf * 0.
	 * 
	 * if d == 0 and d1 == 0 on step j, then on the next step j-1 you want
	 * f1 = p = 0 . Since f1(j) is nominally computed as (0/0)*inf, and
	 * since f1(j-1) is nominally computed as (f1(j)/inf)*0, the desired
	 * result can be obtained by substituting inf for 0/0 and p (= 0) for
	 * inf * 0.
	 */

	f1 = 0;
	f = a[N];

	for (j = N - 1; j >= 0; j--) {
		f1j2 = f1j1;		/* Save this guy for recomputation -
				won't be defined until j=N-2 but that's the
				earliest it will be needed.. */
		f1j1 = f1;		/* Intermediate save. */
		d = x + f;
		q = b[j] / d;
		f = a[j] + q;
		d1 = 1.0 + f1;
		try { r = d1/d ; }
		catch (FE_INVALID) { 	/* invalid division 0/0 or inf/inf 
					   detected */
			if (d1 == 0) {	/* f1 = (0/0)*inf so return infinity */
				f1 = q;
			} else {	/* f1 = (inf/inf)*0 so return "p" */
				f1 = (1.0 + f1j2) * b[j + 2] / b[j + 1];
			}
			goto endofloop;
		};

		f1 = -r * q;

endofloop:	{		/* some kind of statement required here */
		}
	}

	*pf = f;
	*pf1 = f1;
	return;
}

By the way, figuring out that the important exceptional cases are
(0/0)*inf and (inf/inf)*0,  given the input constraints on x, a, and b,
is an interesting exercise.

 ==========

 From dgh Tue Oct  8 17:39:45 1991
 To: nceg at cray.com
 Subject: Exception Handling IV:  try/catch a switch?

try/catch is a sort of like a switch.  Maybe if it were more so, the
problems with labels I mentioned previously would go away, e.g. a
somewhat different syntax for the example I have been using:

        __eswitch__ (r = d1/d) {
        case FE_INVALID: /* invalid division 0/0 or inf/inf detected */
                if (d1 == 0) {  /* f1 = (0/0)*inf so return infinity */
                        f1 = q;
                } else {        /* f1 = (inf/inf)*0 so return "p" */
                        f1 = (1.0 + f1j2) * b[j + 2] / b[j + 1];
                }
                break;          /* instead of goto... */
        default:        /* unexceptional case */
                f1 = -r * q;
                break;
        }

Maybe that's an improvement:  an __eswitch__ is like a switch, except the
cases are cases of exceptions rather than cases of expression values.