Most people don't notice when arithmetic is right....

Fri Nov 5 12:54:14 PST 1993

....but when you blow it badly enough, users eventually figure it out.;>

In my experience compiler writers nearly always want to do just one
more optimization .... and common test suites often don't check for
the fine points of arithmetic :>

The following postings showed up in *.hp newsgroups I happened to be
wandering through. I have not personally validated any of the material.

--text follows this line--

   Newsgroups: comp.sys.hp.hpux
   From: carlsonamath.purdue.edu (Neil N. Carlson)
   Organization: Purdue University
   Date: Tue, 2 Nov 1993 21:13:34 GMT

   In article <bekker.752254172aciv.utwente.nl>, Chera Bekker <bekkeratn.utwente.nl> writes:
   >
   > [stuff deleted]
   > 
   > Are these routines adapted for HPUX? In the LAPACK sources from 
   > iwork.ecn.uiowa.edu it is mentioned that HP does some special things with
							   ^^^^^^^^^^^^^^
   > complex double precision arithmetics. There some BLAS routines have been
   > adapted.
   > 

   This is an extremely diplomatic way of putting it.  I won't be so kind.
   HP's implementation (HP-UX 9.01 700 series) of the complex reciprocal
   is naive and *INCORRECT* for about half of the valid exponent range.
   No other machine I know of is so foolish.  I'd even bet that Fortran
   compilers for PCs would do it correctly (anybody care to give it a try).
   I've appended a short code that illustrates the problem.  This defect 
   is responsible for the complete failure of the complex LAPACK test suite
   as others have recently noted.  I acknowledge that this is completely
   irrelevant for a great many HP users, but for those doing scientific
   computing it is a serious issue.  If you're one of these people, I'd
   urge you to submit a formal complaint to HP, as I have done.  Perhaps
   we can shame them into fixing the problem.

   Neil Carlson
   carlsonamath.purdue.edu

   ************************************************************************
   *
   *  An example which demonstrates the naive computation of the
   *  complex reciprocal on the HP 7xx.
   *
   *     (1) f77 example.f
   *
   *     (2) f77 +FPD example.f
   *
   *     (3) f77 { +FPO | +FPU | +FPZ } example.f
   *
   *  The executable generated by (1) and (2) produce incorrect results
   *  (no correct digits) in some cases, and the executable generated by
   *  (3) will produce a spurious floating point exception.  Comparing
   *  cases (1) and (2) shows the results of disabling gradual underflow;
   *  the example with inexact arithmetic shows the loss of precision
   *  with gradual underflow.
   *
   ************************************************************************
	 integer     n(9), m(9), j
	 complex*16  x(9), y(9), z(9), xx, rx
	 real*8      base

	 data n / 1021, 512, 511, 0, -511, -512, -537, -538, -1022 /
	 data m /  308, 154, 153, 0, -153, -154, -161, -162,  -307 /

	 xx = (1.d0,  1.d0)
	 rx = (.5d0, -.5d0)

	 if (.true.) then

   *       Exact binary arithmetic.
	   base = 2.d0

	 else

   *       Inexact arithmetic (but nicer looking results).
	   base = 10.d0
	   do 10 j=1,9
      10   n(j) = m(j)

	 endif

	 do 20 j=1,9
	  x(j) = xx * base**n(j)
	  y(j) = rx / base**n(j)
      20 continue

	 do 30 j=1,9
      30 z(j) = 1.d0 / x(j)

	 write(6,810) (base, n(j), y(j), z(j), j=1,9)

     810 format(/, 'z = (1,1) * ', f4.1, ' **', i5, /
	&          '-------------------------', /
	&          '   Exact 1/z = (', e21.16, ',', e23.16, ')', /,
	&          'Computed 1/z = (', e21.16, ',', e23.16, ')', /)

	 stop
	 end

   -- 

   +--------------------------------------------------------------------+
   |  Neil N. Carlson                          carlsonamath.purdue.edu  |
   |  Department of Mathematics                317-494-1920             |
   |  Purdue University                        317-494-0548 (FAX)       |
   +--------------------------------------------------------------------+
Newsgroups: comp.sys.hp.hpux
From: Chera Bekker <bekkeratn.utwente.nl>
Subject: Re: BLAS Routines for HP 9000 735
Reply-To: bekkeratn.utwente.nl
Organization: University of Twente, Enschede, The Netherlands
Date: Wed, 3 Nov 1993 09:44:20 GMT

carlsonamath.purdue.edu (Neil N. Carlson) writes:

>This is an extremely diplomatic way of putting it.  I won't be so kind.
>HP's implementation (HP-UX 9.01 700 series) of the complex reciprocal
>is naive and *INCORRECT* for about half of the valid exponent range.
>No other machine I know of is so foolish.  I'd even bet that Fortran
>compilers for PCs would do it correctly (anybody care to give it a try).
>I've appended a short code that illustrates the problem.  This defect 
>is responsible for the complete failure of the complex LAPACK test suite
>as others have recently noted.  I acknowledge that this is completely
>irrelevant for a great many HP users, but for those doing scientific
>computing it is a serious issue.  If you're one of these people, I'd
>urge you to submit a formal complaint to HP, as I have done.  Perhaps
>we can shame them into fixing the problem.
>
>Neil Carlson
>carlsonamath.purdue.edu
>
---- example program deleted -----

I could just say the comlex arithmetic on HPUX 9.01 sucks, but here are the 
notes about porting LAPACK to HPUX. Draw your own conclusions.

N.B. it is a bit lengthy but well worth reading for everyone who does 
scientific computing.

------------------ begin notes -------------------------------------------

        Notes for Porting LAPACK to the HP/Apollo
            PA-RISC 700 Series Workstations

Although the LAPACK Fortran source code will compile as-is from the
"netlib" archives, the complex and complex*16 test programs in the
./LAPACK/TESTING directory will fail to pass the test threshold values 
for various subroutines.

The cause of this problem is the method in which the HP-UX compiler
implements the complex division *AND* the complex absolute value
operations (note that *only* complex division is bad on the Apollo
Domain/OS machines). No hardware error messages will be given on the
HP-UX series 700 workstations, unlike the Domain/OS machines.

The HP-UX machines do *NOT* trap and/or report IEEE errors by default.
The routines and the test programs must be compiled with the +T option
in order to enable IEEE floating point traps. Unfortunately, this
enables *ALL* traps, including floating point underflow, even through
this conditition is usually benign. Since the "slamach" and "dlamach"
routines in the LAPACK library routinely cause floating point underflows
when they are called to get an estimate of the machine's range and
precision, all of the main programs in the LAPACK/TESTING and
LAPACK/TIMING directories have been modified with the HP-UX Fortran
extension:

      ON REAL*4 UNDERFLOW IGNORE
      ON REAL*8 UNDERFLOW IGNORE

as the first executable statement in the program. The original,
unmodified main programs have been preserved by in files with ".orig"
tacked onto the end of the filename. The modified programs are:

    LAPACK/TESTING/EIG/cchkee.f.hpux
    LAPACK/TESTING/EIG/dchkee.f.hpux
    LAPACK/TESTING/EIG/schkee.f.hpux
    LAPACK/TESTING/EIG/zchkee.f.hpux
    LAPACK/TESTING/LIN/cchkaa.f.hpux
    LAPACK/TESTING/LIN/dchkaa.f.hpux
    LAPACK/TESTING/LIN/schkaa.f.hpux
    LAPACK/TESTING/LIN/zchkaa.f.hpux
    LAPACK/TIMING/EIG/ctimee.f.hpux
    LAPACK/TIMING/EIG/dtimee.f.hpux
    LAPACK/TIMING/EIG/stimee.f.hpux
    LAPACK/TIMING/EIG/ztimee.f.hpux
    LAPACK/TIMING/LIN/ctimaa.f.hpux
    LAPACK/TIMING/LIN/dtimaa.f.hpux
    LAPACK/TIMING/LIN/stimaa.f.hpux
    LAPACK/TIMING/LIN/ztimaa.f.hpux

With the +T compiler option enabling IEEE floating point traps, and the
ON REAL*WHATEVER IGNORE statements enabling us to execute the test
programs without dying from benign conditions, we see the same kind of
floating point errors with the HP-UX series 700 compilers
as with the Domain/OS compilers. However, unlike the Domain/OS machines,
the errors are generated both by division-by-zero conditions in complex
division statement *AND* in illegal operand conditions (taking the
square-root of zero) in complex absolute-value functions.

There are two possible fixes for the HP-UX version of LAPACK. The first
is to edit the routines "slabad.f" and "dlabad.f" to force them to
adjust the smallest and largest usable number to be the square-root of
the numbers calculated above. While this avoids both the complex
arithmetic divide-by-zero error and the complex absolute-value
square-root of zero error, it reduces the numerical accurary of the
package, and some of the single and double precision (non-complex, real
numbers) test programs begin complaining about round-off errors
exceeding the expected thresholds. The "slabad" and "dlabad" routines
were orginally meant to detect whether you were running on a Cray (which
has an exponent range of up to 1.0E+2500, but which apparently has
inaccurate numerical results at the extreme ends of the range).

The other choice is to find all of the places where either the complex
divide-by-zero or square-root-of-zero errors occur, and to replace the
arthimetic operations with an explicitly ordered set of operations
designed to avoid floating point underflow or overflow. This is the
method I've chosen.

Fortunatley, the HP-UX 8.07 compiler implements both its complex
division and double-precision complex division operations as subroutine
calls to the routines in the HP-UX F77 library. By simply adding complex
and complex*16 routines with the same names to the BLAS library, we can
intercept all of the complex division operations in both the main
programs, the LAPACK library routines, and the BLAS routines. These
routines have been place in the files "cmath.f.hpux" and "zmath.f.hpux"
in the LAPACK/BLAS/SRC directory, and have been included into the BLAS
library so that they are accessable by any routine in either the LAPACK
or the BLAS libraries.

Unfortunately, at HP-UX version 9.0, the compiler will start performing
complex division as inline code, and this fix will no longer work. Since
many of the routines which have problems with complex division have
already been found for the Apollo DN10000 version, perhaps a merged
Apollo/HP-UX version can be accomplished. However, there is likely to be
a new version of LAPACK released by then, and it would be easier to fix
the bloody HP Fortran compiler than to re-edit and re-test several
hundred thousand lines of code.

The complex absolute value function, unfortunately, is a different
story. Unlike the Domain/OS compiler, which does complex division as
inline code (which must be replaced by hand)  but which does the complex
absolute value function as a subroutine (which works correctly and
doesn't need to be tampered with), the HP-UX 8.07 compiler does the
complex absolute-value function as inline code (which must be replaced
by hand) and does complex division as a subroutine (which has simply
been replaced with our own routine). The problem, is that there are a
*LOT* of references to the ABS function in routines that cause
error-traps in the testing programs.

Several dozen complex and complex*16 routines in both the ./LAPACK/SRC,
the ./LAPACK/BLAS/SRC, and in the ./LAPACK/TESTING/EIG and
./LAPACK/TESTING/LIN directories another couple of routines in the
./LAPACK/BLAS/SRC directories have been edited to replace references to
the complex ABS function with calls to the external routines CABS or
ZABS, which perform complex/double-precision complex absolute-value
using the technique suggested by Knuth in his Semi- Numerical Algorithms
books. These routines have been place in the files "cmath.f.hpux" and
"zmath.f.hpux", along with the routines which replace the HP-UX F77
libraries implmentation of complex division.

The original code of all the routines which have been modified has been
copied to a ".orig" file in each case. (there are also a couple of
routines, "dspevx" and "sspevx"  that  had bug-fixes released after the
LAPACK 1.0a release that also have ".orig.bad" files). These
modifications seem to allow the HP750 used for this port to execute all
of the test and timing programs without error and without loss of
accurracy.

Note that *ONLY* routines which caused divide-by-zero error traps during
the execution of the test programs in LAPACK/TESTING have been modified.
Other routines which *may* use the  complex ABS function have not been
touched. It is *NOT* feasible to examine every line of code in the
LAPACK and BLAS library to find every, single occurrance of the "ABS"
symbol and to check the data-types of the operands to see if the
operation must be replaced!

The division of two complex numbers, Z1 = (A + Bi) and Z2 = (C + Di) is
defined as:

                Z1/Z2 == (A + Bi)/(C + Di)

The HP-UX 8.07 F77 library implements this as:

                         Z1 * conjugate(Z2)
                Z1/Z2 == -------------------
                         Z2 * conjugate(Z2)   

                         (A + Bi) * (C - Di)
                Z1/Z2 == -------------------
                         (C + Di) * (C - Di)

                         (A*C + B*D) + (B*C - A*D)i
                Z1/Z2 == --------------------------
                               (C**2 + D**2)

While this is technically the correct definition, taking the square of C
and D is where the floating point underflow/overflow results. Most other
compilers (Sun Sparcstation, Cray, Alliant, Convex, et. al.) have
implemented complex division as:

                         (A*C + B*D) + (B*C - A*D)i
                Z1/Z2 == --------------------------
                               (C**2 + D**2)

                For: |C| <= |D| and, therefore |C/D| <= 1

                         (A*C + B*D) + (B*C - A*D)i
                Z1/Z2 == --------------------------
                            D * ( C*(C/D) + D )

                         (A*(C/D) + B) + (B*(C/D) - A)i
                Z1/Z2 == ------------------------------
                              ( C*(C/D) + D )

                Otherwise, for: |C| > |D| and, therefore |D/C| <= 1

                         (A*C + B*D) + (B*C - A*D)i
                Z1/Z2 == --------------------------
                            C * ( C + D*(D/C) )

                         (A + B*(D/C)) + (B - A*(D/C))i
                Z1/Z2 == ------------------------------
                             ( C + D*(D/C) )

Since C/D (or D/C, depending on the case) is less than or equal to 1,
this algorithm is not a nearly sensitive to floating-point underflow and
overflow errors as the textbook implementation used by the Domain/OS
compilers. In addition, since Knuth's algorithm trades off only one
additional division for 3 multiplications, it's not really any slower
than the textbook case even though floating point division is usually
slower than floating point multiplication.

In a similar fashion, the absolute value of a complex number,
Z1 = (A + Bi) is defined as:

                |Z1| == |(A + Bi)| == SQRT( A**2 + B**2 )

While this is technically the correct definition, taking the square of A
and B can cause a floating point underflow/overflow error. Most other
compilers (Sun Sparcstation, Cray, Alliant, Convex, and even the Apollo
Domain/OS compiler) have implemented complex division as:

                |Z1| == |(A + Bi)| == SQRT( A**2 + B**2 )

                For: |A| <= |B| and, therefore |A/B| <= 1

                |Z1| == |(A + Bi)| == |B| * SQRT( (A/B)**2 + 1 )

                or (if the SQRT function is reasonably fast)

                |Z1| == |(A + Bi)| == SQRT( |B| ) * SQRT( A*(A/|B|) + |B| )

                Otherwise, for: |A| > |B| and, therefore |B/A| <= 1

                |Z1| == |(A + Bi)| == |A| * SQRT( 1 + (B/A)**2 )

                or (if the SQRT function is reasonably fast)

                |Z1| == |(A + Bi)| == SQRT( |A| ) * SQRT( |A| + B*(B/|A|) )

Many thanks are owed to Mike Peterson of the University of Toronto's
chemistry department for the weeks and weeks of CPU time and hundreds of
Mbytes of disk space consumed in doing this port. A debt of gratitude is
also due to Bob Montgomery and Mike Vermeulen of HP for their many
useful hints about the HP-UX 8.07 Fortran-77 compiler and libraries.

== David M. Krowitz
   MIT Earth, Atmospheric, and Planetary Sciences Dept.
   Room 54-527
   Cambridge, MA 02139
   617-253-6180
   (krowitzaquake.mit.edu)

--
H.G. Bekker                  (bekkeratn.utwente.nl)           Voice: +3153893107
ELTN 5214 (Applied Physics)                                   Fax:   +3153354003
University of Twente        'This magic day when super-science mingles       
The Netherlands                           with the bright stuff of dreams' -RUSH

Newsgroups: comp.sys.hp.hpux
From: mevach.apollo.hp.com (Mike Vermeulen)
Subject: Re: BLAS Routines for HP 9000 735
Date: Wed, 3 Nov 1993 20:49:16 GMT
Nntp-Posting-Host: hpcomet.ch.apollo.hp.com
Organization: MLL, Hewlett-Packard, Chelmsford, MA, USA
X-Newsreader: TIN [version 1.2 PL1.4]

Problems reported against the HP-UX 9.0 compiler concerning complex
division and complex absolute value have been fixed in the compiler
source.  The compiler no longer uses inline algorithms for these
operations.  The algorithms used by the called library routines have
also been updated.  

Future releases of the compiler and libraries will thus contain these
fixes.

Mike Vermeulen
mevach.hp.com

From: matthewsaac.wfu.edu (Rick Matthews)
Newsgroups: comp.sys.hp.hpux
Subject: Re: BLAS Routines for HP 9000 735
Date: 3 Nov 1993 21:11:28 GMT
Organization: Wake Forest University
NNTP-Posting-Host: ac.wfunet.wfu.edu
X-Newsreader: TIN [version 1.2 PL2]

Neil N. Carlson (carlsonamath.purdue.edu) wrote:
: In article <bekker.752254172aciv.utwente.nl>, Chera Bekker <bekkeratn.utwente.nl> writes:
: >
: > [stuff deleted]
: > 
: > Are these routines adapted for HPUX? In the LAPACK sources from 
: > iwork.ecn.uiowa.edu it is mentioned that HP does some special things with
:                                                         ^^^^^^^^^^^^^^
: > complex double precision arithmetics. There some BLAS routines have been
: > adapted.
: > 

: This is an extremely diplomatic way of putting it.  I won't be so kind.
: HP's implementation (HP-UX 9.01 700 series) of the complex reciprocal
: is naive and *INCORRECT* for about half of the valid exponent range.
: No other machine I know of is so foolish.  I'd even bet that Fortran

Indeed.  We have a HP9000/852 that we can't run any of our scientific
programs on because we can get a clean Lapack installation.  Perhaps
the HP's are OK if you are only doing work with reals, but certainly
not for programs involving complex arithmetic.

Convex has a commercial product (MLIB) for the 700 series that is well
worth considering -- I use it on a Convex.  It contains all of Lapack
and more, optimized for HP 700.  However, I don't think it is
compatible with the 800 series.

If anyone has found a patch to make Lapack work with an 800, PLEASE
let me know.

--
Rick Matthews                     matthewsawfunet.wfu.edu     Ham radio:
Wake Forest University            919-759-5340   (Voice)      WA4GSP
Winston-Salem, NC 27109-7507      919-759-6142   (FAX)

Newsgroups: comp.sys.hp.hpux
From: rvenableahelix.nih.gov (Rick Venable)
Subject: Re: BLAS Routines for HP 9000 735
Organization: FDA Biophysics Lab
Date: Thu, 4 Nov 1993 01:45:41 GMT

In article <CFxoI5.5A0aapollo.hp.com> mevach.apollo.hp.com (Mike Vermeulen) writes:
>Problems reported against the HP-UX 9.0 compiler concerning complex
>division and complex absolute value have been fixed in the compiler
>source.  The compiler no longer uses inline algorithms for these
>operations.  The algorithms used by the called library routines have
>also been updated.  
>
>Future releases of the compiler and libraries will thus contain these
>fixes.
>
>Mike Vermeulen
>mevach.hp.com

What about current releases?  Do I need a software support contract to
get a fix for a broken compiler?  Is there a warranty period for software?
Or must I buy a new compiler to fix an error by HP's compiler writers?
Does 'updated' mean that the library routines have been re-written to allow
the full range of complex values, unlike the previous naive algorithm?

I agree with earlier remarks that this is a SERIOUS flaw as far as scientific
work is concerned; I think HP should distribute fixed compilers/libraries
to anyone who has already purchased Fortran 9000 (like me) free of charge.

--
Rick Venable      (301) 496-1905      |=|       "Eschew
FDA/CBER Biophysics Lab  HFM-419      |=|      Obfuscation"  
Bethesda, MD  20892          USA      |=|
rvenableahelix.nih.gov                |=|      -- the Phantom Nerd   

--end included matter--

Reminder: FORWARDED FROM THE NET means that I think this material
          may be of interest, because of one or more of the
	  several values:

	     a) entertainment  b) technical background 
	     c) "words from the street" d) other

At no point should it be assumed that I endorse, agree, condone, or
otherwise have any truck with the author, the author's comments, the
author's friends, etc.