Exception Handling II: Presubstitution

Tue Oct 8 15:36:14 PDT 1991

The previous message demonstrated a loop to evaluate a continued fraction
and its derivative, and showed how points where the formula for the
derivative breaks down can be bypassed by conventional means, at least in
IEEE arithmetic.

The next step is to consider Kahan's proposal for presubstitution, which turns
the loop in question into the form:

extern double   infinity(void);
extern void     presubstitute(char *, double);

void
continued_fraction(N, a, b, x, pf, pf1)
	int             N;
	double         *a, *b, x, *pf, *pf1;
{
	double          f, f1, d, d1, q, bj, bj1, p;
	int             j;

	/*
	 * Evaluate function f at x by continued fraction a[j] and b[j];
	 * function value to *pf, derivative to *pf1
	 */

/*	Assume
		aj finite
		bj finite and nonzero
		x finite

	Critical step is evaluating f1= -(d1/d)*q

	If d == 0 and d1 != 0 on step j, then
	on the next step j-1 you want f1 = p = b(j-1)/b(j) * d1(j),
	a result determined by taking limits in step j.
	Since f1(j-1) is nominally computed as (inf/inf)*0,
	the desired result can be obtained by substituting 
	inf for inf/inf and p for inf * 0.

	if d == 0 and d1 == 0 on step j, then 
	on the next step j-1 you want f1 = p = 0 .
	Since f1(j) is nominally computed as (0/0)*inf, and
	since f1(j-1) is nominally computed as (f1(j)/inf)*0,
	the desired result can be obtained by substituting
	inf for 0/0 and p (= 0) for inf * 0.
*/

	presubstitute("0/0", infinity());
	presubstitute("inf/inf", infinity());

	f1 = 0;
	f = a[N];
	bj1 = b[N];	

	for (j = N - 1; j >= 0; j--) {
		d = x + f;
		d1 = 1.0 + f1;
		bj=b[j];
		q = bj / d;
		f1 = -(d1 / d) * q;		/* CRITICAL STEP */
		f = a[j] + q;
		p = bj1 * d1 / bj;
		presubstitute("0*inf", p);
		bj1 = bj;
	}

	*pf = f;
	*pf1 = f1;
	return;
}

With j counting downward, bj1 means b[j+1],
the b[j] that was in effect during the
previous iteration.

Note that there is an error in Kahan's 17 March 1987 draft of
"Presubstitution, and Continued Fractions" -
the presubstitution value should be written

	b   * d' / b
	 j+1        j

The first subscript appears uniformly in the draft as j-1,
perhaps a relic of an earlier draft in which the loop ran from j=0 to j=N-1.

In contrast to the "conventional method" proposed previously, which required
allocating two registers to store previous-iteration data and more importantly
a conditional branch to test the result of d1/d on every iteration,
the presubstitution method requires setting up two presubstitutions outside
the loop to cause 0/0 and inf/inf to be replaced by inf, 
and another within the loop that causes 0*inf to be replaced by p, which
must be computed on each iteration and inserted into the arithmetic pipe
in case 0*inf arises on the next iteration.    
Thus updating the presubstitution value must be
a fast operation, even though actually doing the presubstition in the
relatively rare event of an exception might be slow.

There are a couple of things to note about the function presubstitute():

1) 	If implemented as hardware fpop codes, presubstitute()
needs to be faster than a
conditional branch in order to be useful.  Thus it shouldn't be a synchronizing
operation that stops the pipe; updates to the presubstituted values have
to flow through the pipe with other fpops.

	Furthermore there must be some presubstitution registers, at least
one each for 0/0, inf/inf, 0*inf, other invalid, finite/0, overflow, and
underflow; on a V8 SPARC, for instance, presubstitution registers must be
able to accomodate 128-bit long double data types, so there are already at
least 7 * 4 = 28 32-bit words of presubstitution registers, compared to
the 32 32-bit floating-point data registers used in normal operation;
all must be saved and restored during context switches.   I'd like to hear
from hardware designers about the cost of this approach.

	Alternatively presubstitution registers could be implemented
as 5-bit pointers to the 32 floating-point data registers.  
Then data registers used for presubstitution values
(4 32-bit registers in this example) would be subtracted from the pool of 
conventional
registers available for optimizer allocation.   Some compiler cooperation would
be required to understand this - presubstitution would be an operation known
to compilers.   Hardware paths would have to be available
to bring presubstitution values from the appropriate data registers to the
destination register when needed.   This operation (inserting the presubstituted
value) could be slow because it's rare, unlike setting up the presubstituted 
value, which must be fast.

	On vector machines, a vector version of the continued fraction program
might have x a vector parameter rather than a scalar.
Presubstitution registers would have to be the
same size as vector registers.   Or else this kind of code simply wouldn't
be vectorized.

2)	Presubstitute() can't be implemented as a system call.  That would
take longer than the conditional compare.

3)	Presubstitute() could be implemented simply by storing presubstituted
values into a normal array in user address space memory.  Programs planning
to use presubstitution first register their intent at program initialization
by a system call that tells the kernel where to find this array of 
presubstitution values in user space.   Thus the array can be updated relatively
quickly - it's just a (probably cached) write.  
Performing the substitution when an
exception arises requires that the floating-point hardware be trappable
and continuable after traps.   The system call that initializes presubstitution
would also initialize the hardware in this case to trap on every invalid
operation.    The kernel trap handler would examine each such trapping fpop
and obtain the value to be used from the user-space table.   It is not
necessary for this purpose that user-mode code handle the trap, although
that's one possible implementation.   Hopefully the user-mode code won't
actually be written by the end user.

	This appears to be most likely way of implementing presubstitution,
should it prove worthwhile.