numeric IA-64 floating-point issues

Mon Jun 4 15:47:43 PDT 2001

David Hough suggested that I repost part of some private
correspondence between us last week about Hewlett-Packard/Intel IA-64
(code names Merced, Itanium, McKinley, ...)  floating-point data.
Here it is, with some minor editing and additions:

A couple of days ago, I stumbled upon the Intel Technology Journal,
which is an electronic journal that was published internally at Intel
for a decade, before being put on the Web in 1997.  Despite that, it
appears in no library catalog in the U.S. and Europe, or any database,
and I'd never seen a reference to it until this week.

I've now prepared a complete bibliography of the journal, and have
printed out and read a couple of dozen of the articles in it.  There
is lots of good stuff on Pentium 4 and IA-64.

The new bibliography is at

        ftp://ftp.math.utah.edu/pub/tex/bib/intel-tech-j.*
        http://www.math.utah.edu/pub/tex/bib/index-table-i.html#intel-tech-j
        http://www.math.utah.edu/pub/tex/bib/idx/intel-tech-j/index.html
        http://www.math.utah.edu/pub/tex/bib/toc/intel-tech-j.html

There is a separate bibliography at

        ftp://ftp.math.utah.edu/pub/tex/bib/intel-ia-64.*
        http://www.math.utah.edu/pub/tex/bib/index-table-i.html#intel-ia-64

of everything I've been able to find published about the IA-64
architecture.  [All bibliographies mentioned in this letter will be
updated as new data comes to light. Contributions/corrections/comments
are welcome: please send them to me privately, so as not to encumber
this list.]

In this paper

aArticle{Cornea-Hasegan:1999:IFP,
  author =       "Marius Cornea-Hasegan and Bob Norin",
  title =        "{IA-64} Floating-Point Operations and the {IEEE}
                 Standard for Binary Floating-Point Arithmetic",
  journal =      j-INTEL-TECH-J,
  number =       "Q4",
  pages =        "16",
  day =          "22",
  month =        nov,
  year =         "1999",
  bibdate =      "Fri Jun 01 06:02:08 2001",
  URL =          "http://developer.intel.com/technology/itj/q41999/articles/art_6.htm;
                 http://developer.intel.com/technology/itj/q41999/pdf/ia64fpbf.pdf",
  acknowledgement = ack-nhfb,
}

I found some answers to questions that we discussed a couple of weeks
ago:

(1) 82-bit fp registers can be spilled to memory in `register-file
    format', as 128-bit values, with the data in the lower 82 bits.

(2) The reasons for the 82-bit format are to have IEEE 754 d.p. with 2
    extra exponent bits, so that:

      (i) divide and square root are implemented in software; the extra
          exponent range prevents underflow and overflow in their
          intermediate steps;

     (ii) x^2 + y^2 can be computed without underflow or overflow

    (iii) products of all double-extended denormalized numbers can be
          represented.

There are two other articles in the journal that may be of interest:

aArticle{Cornea-Hasegan:1998:PIC,
  author =       "Marius Cornea-Hasegan",
  title =        "Proving the {IEEE} Correctness of Iterative
                 Floating-Point Square Root, Divide, and Remainder
                 Algorithms",
  journal =      j-INTEL-TECH-J,
  number =       "Q2",
 pages =        "11",
  year =         "1998",
  bibdate =      "Fri Jun 01 06:02:08 2001",
  URL =          "http://developer.intel.com/technology/itj/q21998/articles/art_3.htm;
                 http://developer.intel.com/technology/itj/q21998/pdf/ieee.pdf",
  acknowledgement = ack-nhfb,
}

aArticle{OLeary:1999:FVI,
  author =       "John O'Leary and Xudong Zhao and Rob Gerth and
                 Carl-Johan H. Seger",
  title =        "Formally Verifying {IEEE} Compliance of Floating-Point
                 Hardware",
  journal =      j-INTEL-TECH-J,
  number =       "Q1",
  pages =        "10",
  day =          "17",
  month =        feb,
  year =         "1999",
  bibdate =      "Fri Jun 01 06:02:08 2001",
  URL =          "http://developer.intel.com/technology/itj/q11999/articles/art_5.htm;
                 http://developer.intel.com/technology/itj/q11999/pdf/floating_point.pdf",
  acknowledgement = ack-nhfb,
}

The Cornea-Hasegan:1998:PIC discusses how to obtain IEEE 754
good-to-the-last-half-bit results for iterative approximations used in
software.

Despite this progress, all is not entirely rosy: I recently cited this
quote

>> ...
>> While correctly rounded implementations are ideal, they are
>> unattainable at present within practical speed and resource
>> limits. \ldots{} Therefore, a worst-case error below 0.6
>> ulps is an improvement when compared to that of 1 ulp in the
>> Pentium$^{\textsc{tm}}$ generation.
>> ...

about actual IA-64 and x86 chip implementations from

aInProceedings{Story:1999:NAI,
  author =       "S. Story and P. T. P. Tang",
  title =        "New Algorithms for Improved Transcendental Functions
                 on {IA-64}",
  crossref =     "Koren:1999:ISC",
  pages =        "4--11",
  year =         "1999",
  bibdate =      "Mon Feb 7 07:28:26 MST 2000",
  URL =          "http://euler.ecs.umass.edu/paper/final/paper-118.ps;
                 http://euler.ecs.umass.edu/paper/final/paper-118.pdf",
  acknowledgement = ack-nhfb,
  keywords =     "computer arithmetic; ARITH; IEEE",
}

aProceedings{Koren:1999:ISC,
  editor =       "Israel Koren and Peter Kornerup",
  booktitle =    "14th IEEE Symposium on Computer Arithmetic:
                 proceedings: April 14--16, 1999, Adelaide, Australia",
  title =        "14th {IEEE} Symposium on Computer Arithmetic:
                 proceedings: April 14--16, 1999, Adelaide, Australia",
  publisher =    pub-IEEE,
  address =      pub-IEEE:adr,
  pages =        "xi + 274",
  year =         "1999",
  ISBN =         "0-7803-5609-8, 0-7695-0116-8, 0-7695-0118-4",
  ISSN =         "1063-6889",
  LCCN =         "QA76.6 .S887 1999",
  bibdate =      "Mon Feb 7 07:28:26 MST 2000",
  note =         "IEEE Computer Society Order Number PR00116. IEEE Order
                 Plan Catalog Number 99CB36336.",
  URL =          "http://computer.org/conferen/home/arith/;
                 http://www.ecs.umass.edu/ece/arith14/program.html",
  acknowledgement = ack-nhfb,
  annote =       "Also known as ARITH-14.",
  source =       "Computer arithmetic",
  sponsor =      "IEEE.",
}

In addition, updating of fparith.bib this morning turned up more
papers about the very difficult problem of verifying floating-point
algorithms and the correctness of their mapping to chip circuits:

aInProceedings{Clarke:1996:WLS,
  author =       "E. M. Clarke and M. Khaira and X. Zhao",
  editor =       "IEEE",
  booktitle =    "33rd Design Automation Conference: proceedings 1996,
                 Las Vegas Convention Center, Las Vegas, NV, June 3--7,
                 1996",
  title =        "Word-level symbolic model checking: avoiding the
                 {Pentium FDIV} error",
  publisher =    pub-IEEE,
  address =      pub-IEEE:adr,
  pages =        "645--648",
  year =         "1996",
  ISBN =         "0-7803-3294-6 (casebound), 0-7803-3364-0 (softbound),
                 0-7803-3295-4 (microfiche), 0-89791-779-0 (ACM)",
  LCCN =         "TA174 .D46 1996",
  bibdate =      "Sat Jun 02 07:44:08 2001",
  note =         "ACM order number 47796. IEEE catalog number
                 96CH35932.",
  URL =          "http://www.acm.org/pubs/contents/proceedings/dac/240518/;
                 http://www.acm.org/pubs/citations/proceedings/dac/240518/p645-clarke/;
                 http://www.acm.org/pubs/articles/proceedings/dac/240518/p645-clarke/p645-clarke.pdf",
  acknowledgement = ack-nhfb,
  bookpages =    "xxx + 839",
}

aArticle{Moore:1998:MCP,
  author =       "J. S. Moore and T. W. Lynch and M. Kaufmann",
  title =        "A mechanically checked proof of the {AMDK586}
                 floating-point division program",
  journal =      j-IEEE-TRANS-COMPUT,
  volume =       "47",
  number =       "9",
  pages =        "913--926",
  month =        sep,
  year =         "1998",
  CODEN =        "ITCOB4",
  ISSN =         "0018-9340",
  bibdate =      "Sat Jun 02 07:48:02 2001",
  acknowledgement = ack-nhfb,
}

aArticle{Russinoff:1999:MCP,
  author =       "David M. Russinoff",
  title =        "A mechanically checked proof of correctness of the
                 {AMD K5} floating point square root microcode",
  journal =      j-FORM-METHODS-SYST-DES,
  volume =       "14",
  number =       "1",
  month =        jan,
  year =         "1999",
  CODEN =        "FMSDE6",
  ISSN =         "0925-9856",
  bibdate =      "Sat Jun 02 07:51:51 2001",
  note =         "Special issue on arithmetic circuits.",
  URL =          "http://www.wkap.nl/jrnltoc.htm/0925-9856;
                 http://www.wkap.nl/oasis.htm/194808",
  acknowledgement = ack-nhfb,
}

aArticle{Even:2000:CTR,
  author =       "G. Even and P.-M. Seidel",
  title =        "A Comparison of Three Rounding Algorithms for {IEEE}
                 Floating-Point Multiplication",
  journal =      j-IEEE-TRANS-COMPUT,
  volume =       "49",
  number =       "7",
  pages =        "638--650",
  year =         "2000",
  CODEN =        "ITCOB4",
  ISSN =         "0018-9340",
  bibdate =      "Sat Jun 02 08:17:12 2001",
  acknowledgement = ack-nhfb,
}

Finally, here are a couple of recent papers on the floating-point
vector sum problem:

aInProceedings{Tsuji:1995:ASF,
  author =       "K. Tsuji",
  editor =       "D. (Dimitur) Bainov and Valery Covachev",
  booktitle =    "Proceedings of the Third International Colloquium on
                 Numerical Analysis: Plovdiv, Bulgaria, 13--17 August
                 1994",
  title =        "An Algorithm for Sum of Floating-Point Numbers without
                 Round-Off Error",
  publisher =    "VSP",
  address =      "Utrecht, The Netherlands",
  pages =        "181--190",
  year =         "1995",
  ISBN =         "90-6764-193-6",
  LCCN =         "QA297.I45 1994",
  bibdate =      "Sat Jun 02 08:14:13 2001",
  acknowledgement = ack-nhfb,
  bookpages =    "vi + 229",
}

aArticle{Kao:1999:LTA,
  author =       "Ming-Yang Kao and Jie Wang",
  title =        "Linear-Time Approximation Algorithms for Computing
                 Numerical Summation with Provably Small Errors",
  journal =      j-SIAM-J-COMPUT,
  volume =       "29",
  number =       "5",
  pages =        "1568--1576",
  day =          "12",
  month =        jul,
  year =         "1999",
  CODEN =        "SMJCAT",
  ISSN =         "0097-5397 (print), 1095-7111 (electronic)",
  bibdate =      "Fri May 25 06:06:40 2001",
  bibsource =    "http://epubs.siam.org/sam-bin/dbq/toclist/SICOMP/29/5",
  URL =          "http://epubs.siam.org/sam-bin/dbq/article/34159",
  acknowledgement = ack-nhfb,
}

and on the elementary functions:

aArticle{Harrison:1999:CTF,
  author =       "John Harrison and Ted Kubaska and Shane Story and Ping
                 Tak Peter Tang",
  title =        "The Computation of Transcendental Functions on the
                 {IA-64} Architecture",
  number =       "Q4",
  pages =        "7",
  day =          "22",
  month =        nov,
  year =         "1999",
  bibdate =      "Fri Jun 01 06:02:08 2001",
  URL =          "http://developer.intel.com/technology/itj/q41999/articles/art_5.htm;
                 http://developer.intel.com/technology/itj/q41999/pdf/transendental.pdf",
  acknowledgement = ack-nhfb,
}

aTechReport{Harrison:2000:HOM,
  author =       "John Harrison and Ted Kubaska and Bob Norin and Shane
                 Story and Ping Tak Peter Tang",
  title =        "Highly Optimized Mathematical Functions for the
                 {IA-64} Architectures",
  type =         "Technical report",
  number =       "245410-002",
  institution =  "Intel Corporation",
  month =        apr,
  year =         "2000",
  bibdate =      "Sat Jun 02 11:04:28 2001",
  URL =          "ftp://download.intel.com/design/IA-64/Downloads/libm.pdf",
  acknowledgement = ack-nhfb,
}

These last two are of interest because the authors address the problem
of mapping polynomial approximations onto microparallel floating-point
hardware, and show that it is better and faster to compute two short
polynomials at once, than one long one.  I suspect their observations
apply to other current RISC architectures as well, but I don't recall
ever seeing this point raised in any publication.  

A publication history of elementary and transcendental function
evaluation is recorded here:

	ftp://ftp.math.utah.edu/pub/tex/bib/elefunt.*
	http://www.math.utah.edu/pub/tex/bib/index-table-e.html#elefunt

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- Center for Scientific Computing       FAX: +1 801 585 1640, +1 801 581 4148 -
- University of Utah                    Internet e-mail: beebeamath.utah.edu  -
- Department of Mathematics, 322 INSCC      beebeaacm.org  beebeacomputer.org -
- 155 S 1400 E RM 233                       beebeaieee.org                    -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe  -
-------------------------------------------------------------------------------