interval X**Y and exceptions

Wed Apr 1 22:53:10 PST 1998

Due to the change of mailing list, i also want to repeat some of
the issues discussed re X**Y.

David Hough wrote,

>>It is a significant further step to assert that interval X**Y, 
for X < 0, should be based on that measure-zero subset of the values 
of Y for which X**Y happens to have a definable real value.<<

This is precisely what is needed in practice for interval applications.
Fractional powers of negative numbers are very useful (though one could
get by without them, at the expense of programming effort). 

The main use
in our global optimization context is to draw conclusions about ranges
from constraints such as sum(x_i^p) in S, given x_i in X_i.
In the simplest case we have sums consisting of a single term only, for
example 
x^3 in S,
and I want to be able to conclude that
x in S^(1/3)
without having to split S into its positive and negative part.
And of course I do not want to lose solutions of x^3=-1 just 
because (-1)^(1/3) results in NaN or NaI or (even worse) empty!

There are also applications in integer programming via constraint
satisfaction techniques. Consider the equation

(-1)^x*y = z

with x in [1:50] integral, y in [0,1], z in [1,2].

Here x=even, y=z=1 are the only solutions. I think this is
representative for more complicated integer constraints that 
enough users are likely to employ in their models when nonlinear 
mixed integer programming software becomes more widely used.

The correct interval evaluation is (-1)^[1,50]=[-1,1], and this
gives enough information to deduce from the equation that y=z=1,
as is done in constraint propagation techniques.

Defining (-1)^[1,50]=empty gives a containment failure. And
defining (-1)^[1,50]=NaI gives useless information; one is
forced to split the integer variable into 50 pieces until a numerical
enclosure is found. This is inefficient, and it becomes prohibitive 
if there are a number of such constraints for different integer 
variables, since the work then grows exponentially. 

Failure to define x**y properly when x<0 makes the power considerably
more difficult to use in such cases, while a proper definition doesn't 
harm users that only need x**y for x>=0.

>>Many programmers might prefer the result of x**(1.0/3) to be the same as the
result of x**(2.0/6), and so defining x**(n/m) to be the m'th root of (x**n)
would be problematic for negative x.<<
The right definition should be x**q = (-1)**n * |x|**q if x is negative
and q is a rational number that can be written as quotient of n/d with
odd d. This respects the wish of what >>Many programmers might prefer<<,
and the needs of users of interval arithmetic.

>>I think the underlying issue is that we are thinking of different 
functions.  I think of x**y as corresponding to the multiple-valued 
complex function, analytic in places if we choose a principal branch 
of log, defined by exp(y**log(x)). ... In contrast Prof. Neumaier's 
definition is based upon a different choice of log branch to yield 1 
... I think it's a mistake (in language design) to use one notation 
** to represent two rather different functions.<<

and then goes on to show that Fortran77 implementations gives the 
results he expected. 

He is quite right in that we are thinking of different functions.
The point I want to make is that it is quite useful to deviate from the
past in order to have more useful properties. For me, not tradition is
the measure of an optimal future, but rather the way new features will
affect applications. 

In a context where no full complex interval arithmetic is provided,
there is no point at all in thinking of function as restrictions of
complex-valued functions. In most applications of intervals, the context
is completely real, and everything could be understood even if complex
numbers had never been invented. So one should look at the functions as
real functions.  (Actually Dr. Hough gets the complex results only when 
demanding explicitly complex arithmetic. The Fortran77 implementations 
also think that with real arithmetic there should be real 
interpretations only and NaN is returned.)

Now people used to the old tradition will avoid fractional powers of
negative numbers anyway since they produce NaN's; so they wouldn't be
harmed by an interval power that gives them more sensible values.
Interval arithmetic, however, is privileged by the fact that all 
operations produce sensible sets and there is no reason why the power 
should be treated differently. 

So the only question is which set to return when the base is negative. 
The empty set is forbidden since it gives containment failure. The
result [-inf,inf] would be a valid possibility but is unnecessarily 
wide. It is enough that all meaningful real outcomes are enclosed,
and to avoid possible containment failures in case of misinterpretation,
the widest set of meaningful answers should be provided. And that
care in this matter has useful applications has been pointed out in
my previous mail.

Since usually the base is >0, the necessary case analysis will slow 
down the operation in the main case not at all (the sign test needs to 
be made even when the result is set to NaN or NaI), so there is not 
even a performance penalty. And I'd be surprised if the penalty for 
hardware coding the exceptions were excessive. 

>>it seems very odd to me that a source code segment
in intervals such as

        X=-1.5
        Y=0.62
        Z=X**Y

should yield a result Z slightly wider than [-(1.5**0.62),+(1.5**0.62)]
on the grounds that 0.62, not representable in binary floating point,
must be widened to an interval, which necessarily contains rational numbers
of the form odd/odd and even/odd, while

        X=-1.5
        Y=0.625
        Z=X**Y

should yield a result Z that is empty<<

I find nothing bad about this. Continuity is not relevant for floating
point operations; we also do not mind that x/x=NaN when a positive x
underflows to 0. We also don't signal an error if (exp(x)-1)/x vanishes
for small x after seemingly converging to 1 as x approaches 0. Every
mathematician is made familiar with such continuity failures in 
finite precision arithmetic.

And for interval methodology, the essential property is enclosure, not 
continuity. Near discontinuities intervals must still produce
reasonable results though their width changes discontinuously. 

For example, the derivative of sqrt(eps+x^2) at x=0 changes from 0 
for eps>0 to [-1,1] for x==0 and becomes undefined for x<0. 
This is precisely what is needed in a context where the absolute 
value function needs to be smoothed.
So one has to expect that singular behavior of the real function results
in discontinuous behavior of the result width of interval operations.
This may be a little unfamiliar to people less acquainted with interval
arithmetic, but it is not more strange than working with empty products
having the value 1, and similar exceptional mathematical constructs,
that are not very intuitive at the beginning, but very useful.

Arnold Neumaier