Comments on draft 2 of LIA

Mon Sep 14 13:31:40 PDT 1992

Comments on the Language Independent Arithmetic Standard 
Draft dated August 31, 1992  

James Demmel 
Computer Science Division and Mathematics Department 
U. C. Berkeley 
September 12, 1992

I read the second draft of the LIA asking whether it would help me as a 
developer of portable numerical software, which is its stated goal. 
I considered the issues of writing reliable code, writing high performance 
code, and software development costs. There are clearly tradeoffs among these 
goals, so I asked whether the LIA exposed these tradeoffs in a convenient way. 
I decided it did not. In fact, the apparent inconsistency of some of the LIA's 
decisions made me decide its underlying philosophy is wrong. I return to this 
at the end with an alternative proposal.

The inconsistency lies in explicitly permitting some very bad arithmetic designs
no one would build, excluding other less bad designs of commercial importance
(Cray), and not supporting some good designs widely available (IEEE). This is
the worst of three worlds. Writing software for the very bad designs would be
prohibitively expensive, and no one would bother since such machines would
not be built. A great deal of good software runs on Crays, and reliably so,
but it is not within the LIA model. Access to several features of IEEE
arithmetic, arguably the best and most widely available floating point, are
not explicitly supported, and in some cases explicitly discouraged.

After describing these inconsistencies in more detail, I return to the question
of what we should do instead.

For the reader's convenience, I repeat some LIA notation:

   r = radix 
   p = number of significant digits 
   emin = minimum exponent (for normalized numbers) 
   emax = maximum exponent 
   succ(x) = next floating point number larger than x 
   epsilon = r**(1-p) = succ(1) - 1 
   rnd_error = maximum error in rounding, as a multiple of epsilon 
   rnd_style = indicator of rounding style

The LIA explicitly permits an absurdly narrow exponent range, requiring only
emin <= 2-p and emax >= 1 (page 9). This is so narrow that the radix r 
itself can overflow, and the maximum round off error on a machine that rounds to
nearest can underflow (since rnd_error*epsilon = .5*epsilon < r**(1-p) =
smallest normalized number). So the programmer cannot even discover basic
machine parameters without exceptions. These are silly inconsistencies which
could be eliminated by insisting emax >= 2 and emin <= 1-p instead.
But writing portable software with such a narrow exponent range would still
be very hard, so hard that no one would bother.

The authors' footnote on page 9 indicates that to be useful, emin has to be
much smaller than 2-p and emax much larger than 1, but they say that placing
limits is beyond the scope of the LIA (but they are happy to make detailed
requirements for addition -- see below). They do make recommendations on 
page 33, but weak ones, just that epsilon**2 should not underflow, and
that emin+emax should be close to zero. I discussed the impact of this
assumption on portability in an earlier  message about the first LIA draft, 
where I pointed out that neither LINPACK nor LAPACK worked as reliably as one 
might hope in VAX D-format arithmetic. In VAX D-format epsilon is about 
10**(-16) and underflow is about 10**(-38). After much effort, both projects 
decided the cost of dealing with VAX D was too high, both because of software
development costs and slowing down the code on other machines. LINPACK and
LAPACK are projects noted for the effort they put into attaining portability.
If LINPACK and LAPACK did not deal completely successfully with an 
exponent range much wider than required by LIA, and even wider than LIA
recommends, why do the authors think LIA will aid construction of 
portable software?

In contrast to dealing with an exponent range as narrow as that permitted
by LIA, it would be a pleasure to deal with a Cray, which rounds poorly.
The existence of a great deal of useful (and even reliable) software for Crays
shows that its arithmetic is tractable in a great many cases if not all.
But this design is excluded from the LIA, because they insist on a guard 
digit in addition, which Crays do not have (page 13). I certainly hope 
arithmetics without guard digits die out, but in the meantime it is quite 
easy to model for many codes; one simply uses

   float(a+b) = a*(1+eps1) + b*(1+eps2) 

instead of

   float(a+b) = (a+b)*(1+eps) 

where eps, eps1 and eps2 all must be tiny. Indeed, most (but not all) linear
algebra algorithms work on the weaker model satisfied by Cray. Why
exclude this useful model from the LIA? Why include the uselessly
narrow exponent range?

The LIA discusses exception handling in a similar way to IEEE, but
with important omissions. The IEEE flags and modes are arguably the
most widely supported arithmetic exception handling and rounding
control mechanisms available, yet LIA does not support them all:

   Exception Flags:
      IEEE's INVALID and DIVIDE-BY-ZERO flags are mapped to LIA's single 
         UNDEFINED flag.
      IEEE's INEXACT flag has no LIA counterpart.

   Rounding Modes:
      No mechanism for setting the rounding mode is given.
      Rounding up and rounding down are not supported (via rnd_style).

   Nonnumerical Values:
      Infinity and NAN are undefined, as are operations on them.

   Trapping:
      The ability to set or reset trapping mode for IEEE exceptions
      is not supported.

LIA says that more exceptions may be defined, so as to include all IEEE
exceptions, but how does this aid portability, if the interface is not
specified? Having everyone make up his own interface is the situation
we are in today. On page 50, it is stated that not all IEEE exceptions
are supported, because they are not universally available. But neither
is an underflow flag generally available at this time, which they
do demand. They also explicitly discourage run-time access to trap
flags in section A.5.3, even though this is explicitly required by IEEE.

The utility of these IEEE features is widely understood and will not be
discussed in detail here. I provide just one example, from LAPACK 2, which
shows how we would have to sacrifice a factor of 3 or more in performance
because of LIA. The paradigm we wish to use is the following:

   1) Run a fast algorithm which is occasionally unreliable. 
   2) Quickly confirm or deny the accuracy of the answer just computed. 
   3) In the rare case the answer is inaccurate, recompute it using a
      slower but more reliable algorithm.

This paradigm is getting to be more attractive on parallel machines, where
the fastest parallel algorithm is often less reliable than its serial counterpart.
For example, consider computing eigenvectors. This involves solving triangular
systems of linear equations. This operation is available in the BLAS, which are
highly optimized codes. These optimized BLAS take no precautions against
over/underflow, since it would slow them down. But the triangular systems
arising in eigenvector calculations are deliberately close to singular, so that
overflow is not completely unlikely. Since we want to scale the solution to have
unit norm anyway, intermediate overflows are not error conditions but just mean
we have to scale. LAPACK 1 scales inside the inner loop because it is designed
to avoid overflow if at all possible; thus it cannot use the optimized BLAS.
This slows down the code considerably. In LAPACK 2, we intend to use the
above paradigm by running the optimized BLAS first, check the overflow flag,
and only it it is set, rerun with scaling. The code could be completely
portable if at run time we could inquire about exception handling, and use the
above paradigm if IEEE is available, and use the slower code otherwise. LIA does
not support this.

In light of these apparently inconsistent decisions, which include some bad
features, exclude other less bad ones, and omit some widely available good
ones, what should we do? I strongly recommend that the LIA simply provide
run-time access to as much information about the arithmetic as possible,
good or bad, insisting only on enough resources to make these inquiries
possible (so the radix does not overflow, for example). Then it is up to
the software developer to make the inevitable tradeoffs among reliability,
performance and development costs, and document them. Such documentation
could be as simple as ``works on any IEEE machine'' or 
``works on any machine where epsilon**3 does not underflow and epsilon**(-3) 
does not overflow''.  It is not possible for the LIA to make these tradeoffs 
correctly in advance for all applications.  There should absolutely be 
run-time access to all widely available good features, including all IEEE 
features. When some features are not available, distinguished values meaning 
``unimplemented'' should be returned by the corresponding functions.

I personally do not intend to stop writing software that can go 3 or more
times faster by exploiting IEEE arithmetic. With the changes I propose above,
I could make such code completely portable. Why should the LIA want to stop 
me from writing such code?