pi and LIA

Sun Jan 15 23:06:09 PST 1995

Brian Wichmann asks us to

> Consider the result of computing sin(pi*(2*n+1/4)). Since pi can only be
> represented approximately, the argument is an approximation of the required
> result. LIA-2 would *require* that for large n that the result diverges from
> 1/sqrt(2). I think this is counter-intuitive.

That sin(PI*(2*n+1/4)) should diverge from sin(pi*(2*n+1/4)) as n increases,
when PI is an inexact approximation to pi, is no more surprising than that
     sin( X*n ) should diverge from        sin( x*n )        as n increases,
for any other X almost but not quite equal to x.    It's not unlike the
common experience of beginning computer users that attempt to increment
loops by 0.1, and thereby begin to discover the difference between binary
and decimal floating-point arithmetic; except that in the trigonometric case,
pi is no more representable in decimal than binary.    Languages like APL
have attempted to fuzz such things in order to defer the shocking discovery
of roundoff from cases that are relatively simply explained to cases that
are far more complex to explain; I don't regard that as progress.   I find
it easier to expect that by default the elementary arithmetic operations 
and the 
elementary algebraic and transcendental functions in the math library treat
their input arguments as carefully as if they were exact, and return results
that are correctly rounded if possible or nearly so if otherwise.   I don't
object if there are non-default alternatives that are faster and less
accurate, that may be selected by those who determine that the potential
performance payoff merits the additional verification of the program,
but I see no point in standardizing these alternatives; they will necessarily
vary among computer systems and languages.    In this light, it is entirely
appropriate for any arithmetic standard to specify "infinitely-precise"
argument reduction for trigonometric functions; and if anybody has a technical
problem implementing that, a sample implementation
can be obtained from fdlibm in netlib.   More on trig argument reduction later.

It is also appropriate for a new arithmetic standard to offer the end user
an easy way around the quandary expressed by Wichmann:  namely by defining,
in addition to the classical trig functions 

	trig(x)			where x is interpreted in radians

also

	trigd(x)		where x is interpreted in degrees

and

	trigpi(x)		where x is interpreted in semicircles so
				trigpi(x) = trig( pi * x) for exact pi

The "trigd" functions are a common Fortran extension, and the trigpi functions
less so, although they have been provided in Sun's libraries for C and
Fortran since 1988.   They are just the TRIGFF functions prescribed in 
LIA-2, with u fixed at 360 and 2 respectively; 
I think they cover at least 99% of
the need represented by TRIGFF without introducing the coding and testing
burden required to support TRIGFF functions with two arbitrary arguments -
two functions of one argument are far easier to test than one function of
two arguments.

============================================================================

I perceive a larger issue with LIA, however.   It seems to me to be an attack
on a problem that peaked about twenty years ago, and has been in extremely
steep decline since about ten years ago, when workstations and PC's implementing
IEEE arithmetic started to take over the bulk of scientific computation -
so that non-IEEE supercomputers remain cost-effective on only
a decreasing fraction of large-scale problems.   That problem was to find a
useful way of describing disparate arithmetic systems that lacked almost any 
unifying principle.    In order to be useful, the description had to permit
robust mathematical software to be written that was portable and reasonably
efficient on the whole range of systems in widespread use at any particular 
time.     Very little such software could be written for problems that 
could effectively exploit predictable rounding or robust exception handling;
by definition the portable software couldn't exploit those useful properties
and so was far from the optimal on those systems.

A different problem confronts us now.    Probably the bulk of floating-point
operations are performed in PC spreadsheet programs, for although each such PC
performs few, such PC's are many in number.   Most technical floating-point
operations are probably performed on workstations and workstation-derived
servers, which are slower but far more numerous than supercomputers.   But the
PC's, workstations, and most new supercomputer designs all use variations of
IEEE 754 binary floating-point arithmetic.    So the job of porting mathematical
software can be considered done for all time.

Well not quite.   Different choices permitted within IEEE 754, different
expression evaluation paradigms, and different libraries - not to mention
gross and subtle bugs in optimizing compilers, not to mention the 
underlying hardware - cause identical numerical
results on different systems to still be the exception rather than the rule,
and to the justifiable complaint of the typical floating-point user - 
now more likely an accountant or technician than a numerical analyst -
technical support people often 
respond that that's just the way floating-point arithmetic is - an inherently
unpredictable process, like the weather.    All these differences don't
help most floating-point users, whose performance is most often limited by
other factors to a far lower level than that at which 
aggressive hardware and software optimizations can help; yet the differences
induced by those optimizations affect all users, whether they obtain a
performance benefit or not.    These differences confuse and delay the
recognition of real significant hardware and software bugs that can 
masquerade for a time as normal roundoff variations.

A useful arithmetic standard to address these current problems in computer
arithmetic would prescribe the DEFAULT:

1)	expression evaluation: what happens when the language does not
	specify the order of evaluation of (a op b op c), or how mixed-mode
	arithmetic is to be evaluated, or the precision of variables in
	registers or storage, and related issues.

2)	correctly-rounded conversion between binary and decimal floating point:
	public domain code is available.

3)	correctly-rounded elementary algebraic and transcendental functions:
	error bound 0.5, not 0.5+, in the terms of LIA-2.

Thus conforming implementations built upon
IEEE 754 arithmetic would provide identical
results for identical types; identical conforming implementations upon VAX,
IBM 370, or Cray arithmetic would provide identical results among their
brethren, and any discrepancies would be immediate evidence of some kind
of hardware or software problem instead of "normal roundoff variation".

These are the basics, although one could imagine going further to specify
the mappings between language types and IEEE 754 types, and details of 
exception handling.

There is no intent here to discourage other methods of expression evaluation,
base conversion, and library function evaluation, that an implementation
might provide beyond the mandated default; merely that the costs and benefits
of those other methods not be inflicted except upon request.    Most 
spreadsheet-type programs and other mass-market software - even much technical
software - can probably be compiled by default with insignificant loss of
performance in normal usage.    In contrast, grand challenge problems will
continue to be run on maximally optimized hardware and software, because
anybody who can afford to work on such problems can afford expert assistance
tuning the software.

In particular, the requirement for correctly-rounded transcendental functions
is in part an open research problem: getting average correctly-rounded
performance not too much worse than average almost-correctly-rounded 
performance is an interesting challenge, especially for functions of
two arguments such as hypot, pow, and atan2.   The general approach is to
apply a fast algorithm in explicit or implicit slightly higher precision,
that produces error bounds tight enough to determine whether the tentative
result will round the same as the exact result, and when that test fails,
apply a much slower but much more accurate algorithm in much higher precision.

To the objection that some current hardware or software
might find it easier to provide a conforming default implementation as 
outlined above, consider that average PC product lifetimes are now less than one
year and workstation manufacturers consider themselves blessed to be able to
sell a product for two years; this rate of technological change has its
unsettling aspects, but it does mean that, when it chooses,
the computer industry as a whole has the 
capacity to adapt to new standards faster than the conventional standards
process can produce them.

============================================================================

As mentioned above, more on trig functions:   
raising an exception is not the right answer either.

Date: Tue, 25 Oct 94 10:42:33 PDT
To: numeric-interest
Subject: no substitute for getting the right answer

Elementary transcendental functions are used in many unpredictable ways,
which is why the implementer of the standard library should do as good a
job as possible.    Persons creating specific applications for which they
can prove that they can use faster, less accurate transcendental functions
can code their own far easier than persons creating specific applications
that require full precision and correct exception handling can code their
own when the library is defective.    fdlibm incorporates "infinite-precision"
trigonometric argument reduction.

The whole issue of large arguments to trig functions boils down to just this:
should there be an exception raised on loss of significance, or cancellation,
which is the situation when an EXACT subtraction magnifies any relative error
that may exist in the arguments.   Whether this is a problem or not depends
on whether there is any relative error in the arguments, and what is done 
with the result, neither of which is known to the floating-point subtraction
hardware.   Note that a conscientious implementation of significance loss
detection in software would have to detect trig cases of rather modest size;
as discussed in my article in the March 1981 IEEE Computer.

In fact skillful exploitation of cancellation is part of the
mechanism for building higher-precision floating point out of lower precision;
dealing with an exception in that case would get in the way, which is probably
why significance exceptions have seldom been implemented in hardware
and when implemented, as on IBM 360 I think, 
have invariably been disabled in software.    The "damage" of cancellation is
always done earlier, 
when error contaminated the operands; hence IEEE 754 specifies
an exception on inexact rather than loss of significance.   That almost
all floating-point programs generate inexact exceptions reinforces the 
observation that if you don't know the error (or its bound) then you don't
know the answer, regardless of what the computer prints; that argues for
manual or automatic error bounding (interval arithmetic).    
Kulisch's papers usually incorporate
drastic examples of cancellation of inexact intermediate results
to demonstrate how even increasing precision may fail to yield any correct
significant figures in conventional floating-point arithmetic.    The lack
of significance in the results is immediately demonstrated by any kind of
error analysis.

The following recent USENET posting illustrates these points; one release
of HP libraries evidently didn't even enforce the identity sin*sin+cos*cos = 1,
with unfortunate results little alleviated by an error message:

=============================================================================

Article: 5643 of comp.lang.fortran
From: oliveriaadown.engin.umich.edu (Roque Donizete de Oliveira)
Newsgroups: comp.lang.fortran,comp.sys.hp.hpux
Subject: Re: how to detect TOTAL LOSS OF PRECISION with HP's f77 compiler
Date: 24 Oct 1994 07:51:12 GMT
Organization: University of Michigan Engineering, Ann Arbor

To followup on my original question, after 2 days of painful debugging.

I have a complex nonlinear integral equation I want to solve for C

  0 = f(C) + integral_{Pi/4}^{Pi/2} { g(Z) dtheta }

where the argument of g(Z) looks like

          Z  = Z(C,theta) = C / DCOS(theta)

At the place where I got the exception
  C = (-.2721096508319146D+01 , -.2400237696120182D-05 )

This value of C is pretty close to exact solution, so the nonlinear
solver is doing ok.

To avoid the singularity at theta=Pi/2 we have used an upper limit
a tiny bit (10^-13) smaller than Pi/2. This is ok (we noticed the
integral converges, and the main contribution comes from near
theta=Pi/2, and the integrand, as a whole, is bound at theta=Pi/2).

The problem arises because near theta=Pi/2 (these points are picked
by the numerical integrator) Z becomes very large, like

 Z = ( -.2721605324800001D+14 , -.2400686515371547D+08 )

However, the g(Z) function, looks like (in this range of Z):

  g(Z)  = some_other_terms + CDEXP(- Z**2 * 0.5)

and therefore the argument of the complex exponential became very
large (like (-3.703567771986978E+26 ,-6.533721203410762E+20)).

Since a exponential of a complex argument can be written as

CDEXP(x + i*y) = DEXP(x) * ( DCOS(y) + i*DSIN(y) )

we are left with trigonometric functions of very large arguments.

Unfortunately, HP's trigonometric functions break for arguments
larger than about 10**8. They return 0 for the function value
and the "warning" "TOTAL LOSS OF PRECISION".

Incidentally, IBM's and Sun's fortran libraries happily accept
trigonometric functions for arguments as large as the 
largest double precision number (about 10**308), even though
the result is correct only for (|y| < 2**50 Pi =3.53 10**15
and x <= 174.673), as stated in the IBM documentation.