[Cfp-interest] dnrm2 exception handling

Wed May 28 15:41:36 PDT 2014

Kahan directed me to his matlab algorithm for BLAS dnrm2, copied here:

http://www.validlab.com/cfp/wk-scaling.pdf

and commenting that the existence of such an algorithm avoided the need
for any special exception-handling apparatus - and hence we should seek a 
better example.      I'm not fully convinced; the first loop I have
in mind would look something like

if 
	(FP_EXCEPTIONAL_NONDETERMINISTIC(
	sz=0 ; for (j = 1 ; j < n ; j++) sz += x[j]*x[j] ; ,
	FE_INDERFLOW, FE_OVERFLOW, FE_INVALID)
	 == 0) {
	/* normal case */
	return sqrt(sz) ;
	} else {
	/* exceptional case - compute more carefully */
	 ...
}

or perhaps somewhat equivalently in a pragma:

{
#pragma STDC FENV_EXCEPT FE_INDERFLOW,FE_OVERFLOW,FE_INVALID break
	sz=0 ; for (j = 1 ; j < n ; j++) sz += x[j]*x[j] ;
	goto pastbreak;
}
/* exceptional case */
 .... return(...);

pastbreak: return sqrt(sz);

The exceptional case handling deals with all underflows, overflows,
and invalids (due to signaling NaNs) with no loop slowdown relative 
to the simplest possible code - though there might be a cost for
installing a trap or for testing flags at the end, at the compiler's
option.      In contrast Kahan's first loop looks like:

for j = 1:n
abx = abs(x(j)) ;
if isinf(abx), sz = abx ; return
elseif isnan(abx), sz = abx ;
else ahat = max(ahat, abx) ; end
if (ahat < big)
%... perform Compensated Summation:
rsz = abx*abx + rsz ;
t = sz ; sz = t + rsz ;
rsz = (t - sz) + rsz ; %... compensation
end, end %... of first pass

It's in Matlab.    It assumes that a certain threshold big has been
precomputed before the loop begins as big = huge/sqrt(n) where huge 
has been computed once in the program from fundamental constants of the
floating-point type.     This loop does compensated summation, which
is a potential addition to any dnrm2 algorithm but not essential to the
exception handling.    This loop also checks for infs and nans which might 
otherwise cause problems for subsequent logic but maybe those checks
can be avoided or deferred.
The part I want to draw attention to simplifies to

for j = 1:n 
abx = abs(x(j)) ; 
ahat = max(ahat, abx) ;
if (ahat < big) 
sz = abx*abx + sz ; 
end, end %... of first pass 

There are two conditionals max(ahat, abx) and ahat < big that are needed
to know whether a second scaling pass will be required.    As proposed, these
need to be executed for on every iteration in the normal case.
Their relative impact is reduced if compensated summation is in effect,
but if it's not, I would expect it to be quite a bit slower than the
minimal loop without the tests.    On the other hand, there is no trap
or flag overhead which might be important when n is small.

One could imagine a very smart compiler that implements FP_EXCEPTIONAL
with a run time test on n to determine which of two fast loops to
execute, one based on final flags for small n, and one based on traps for
large n.     Either would have to break to the slow loop if an exception
arose.

However this example might turn out, Kahan also gave me a much harder
LAPACK  example - more of a research problem - 
that the ambitious can read about at 

http://www.validlab.com/cfp/wk-larfp.pdf

Here the question is how do you get accurate results and avoid overflow/
underflow.       What exception handling facilities would make the problem
easier?    How might the problem be recast to make the most of available
exception handling facilities?

We do need good examples to justify whatever facilities we might 
ultimately propose.    I'm still studying this one.

One thing that's certain from the examples above is that there's not much
beauty to admire in the proposed macro and pragma syntaxes.