[Cfp-interest] dnrm2 exception handling
David Hough CFP
pcfp at oakapple.net
Wed May 28 15:41:36 PDT 2014
Kahan directed me to his matlab algorithm for BLAS dnrm2, copied here:
http://www.validlab.com/cfp/wk-scaling.pdf
and commenting that the existence of such an algorithm avoided the need
for any special exception-handling apparatus - and hence we should seek a
better example. I'm not fully convinced; the first loop I have
in mind would look something like
if
(FP_EXCEPTIONAL_NONDETERMINISTIC(
sz=0 ; for (j = 1 ; j < n ; j++) sz += x[j]*x[j] ; ,
FE_INDERFLOW, FE_OVERFLOW, FE_INVALID)
== 0) {
/* normal case */
return sqrt(sz) ;
} else {
/* exceptional case - compute more carefully */
...
}
or perhaps somewhat equivalently in a pragma:
{
#pragma STDC FENV_EXCEPT FE_INDERFLOW,FE_OVERFLOW,FE_INVALID break
sz=0 ; for (j = 1 ; j < n ; j++) sz += x[j]*x[j] ;
goto pastbreak;
}
/* exceptional case */
.... return(...);
pastbreak: return sqrt(sz);
The exceptional case handling deals with all underflows, overflows,
and invalids (due to signaling NaNs) with no loop slowdown relative
to the simplest possible code - though there might be a cost for
installing a trap or for testing flags at the end, at the compiler's
option. In contrast Kahan's first loop looks like:
for j = 1:n
abx = abs(x(j)) ;
if isinf(abx), sz = abx ; return
elseif isnan(abx), sz = abx ;
else ahat = max(ahat, abx) ; end
if (ahat < big)
%... perform Compensated Summation:
rsz = abx*abx + rsz ;
t = sz ; sz = t + rsz ;
rsz = (t - sz) + rsz ; %... compensation
end, end %... of first pass
It's in Matlab. It assumes that a certain threshold big has been
precomputed before the loop begins as big = huge/sqrt(n) where huge
has been computed once in the program from fundamental constants of the
floating-point type. This loop does compensated summation, which
is a potential addition to any dnrm2 algorithm but not essential to the
exception handling. This loop also checks for infs and nans which might
otherwise cause problems for subsequent logic but maybe those checks
can be avoided or deferred.
The part I want to draw attention to simplifies to
for j = 1:n
abx = abs(x(j)) ;
ahat = max(ahat, abx) ;
if (ahat < big)
sz = abx*abx + sz ;
end, end %... of first pass
There are two conditionals max(ahat, abx) and ahat < big that are needed
to know whether a second scaling pass will be required. As proposed, these
need to be executed for on every iteration in the normal case.
Their relative impact is reduced if compensated summation is in effect,
but if it's not, I would expect it to be quite a bit slower than the
minimal loop without the tests. On the other hand, there is no trap
or flag overhead which might be important when n is small.
One could imagine a very smart compiler that implements FP_EXCEPTIONAL
with a run time test on n to determine which of two fast loops to
execute, one based on final flags for small n, and one based on traps for
large n. Either would have to break to the slow loop if an exception
arose.
However this example might turn out, Kahan also gave me a much harder
LAPACK example - more of a research problem -
that the ambitious can read about at
http://www.validlab.com/cfp/wk-larfp.pdf
Here the question is how do you get accurate results and avoid overflow/
underflow. What exception handling facilities would make the problem
easier? How might the problem be recast to make the most of available
exception handling facilities?
We do need good examples to justify whatever facilities we might
ultimately propose. I'm still studying this one.
One thing that's certain from the examples above is that there's not much
beauty to admire in the proposed macro and pragma syntaxes.
More information about the Cfp-interest
mailing list