numeric IA-64 floating-point issues
Nelson H. F. Beebe
beebeamath.utah.edu
Mon Jun 4 15:47:43 PDT 2001
David Hough suggested that I repost part of some private
correspondence between us last week about Hewlett-Packard/Intel IA-64
(code names Merced, Itanium, McKinley, ...) floating-point data.
Here it is, with some minor editing and additions:
A couple of days ago, I stumbled upon the Intel Technology Journal,
which is an electronic journal that was published internally at Intel
for a decade, before being put on the Web in 1997. Despite that, it
appears in no library catalog in the U.S. and Europe, or any database,
and I'd never seen a reference to it until this week.
I've now prepared a complete bibliography of the journal, and have
printed out and read a couple of dozen of the articles in it. There
is lots of good stuff on Pentium 4 and IA-64.
The new bibliography is at
ftp://ftp.math.utah.edu/pub/tex/bib/intel-tech-j.*
http://www.math.utah.edu/pub/tex/bib/index-table-i.html#intel-tech-j
http://www.math.utah.edu/pub/tex/bib/idx/intel-tech-j/index.html
http://www.math.utah.edu/pub/tex/bib/toc/intel-tech-j.html
There is a separate bibliography at
ftp://ftp.math.utah.edu/pub/tex/bib/intel-ia-64.*
http://www.math.utah.edu/pub/tex/bib/index-table-i.html#intel-ia-64
of everything I've been able to find published about the IA-64
architecture. [All bibliographies mentioned in this letter will be
updated as new data comes to light. Contributions/corrections/comments
are welcome: please send them to me privately, so as not to encumber
this list.]
In this paper
aArticle{Cornea-Hasegan:1999:IFP,
author = "Marius Cornea-Hasegan and Bob Norin",
title = "{IA-64} Floating-Point Operations and the {IEEE}
Standard for Binary Floating-Point Arithmetic",
journal = j-INTEL-TECH-J,
number = "Q4",
pages = "16",
day = "22",
month = nov,
year = "1999",
bibdate = "Fri Jun 01 06:02:08 2001",
URL = "http://developer.intel.com/technology/itj/q41999/articles/art_6.htm;
http://developer.intel.com/technology/itj/q41999/pdf/ia64fpbf.pdf",
acknowledgement = ack-nhfb,
}
I found some answers to questions that we discussed a couple of weeks
ago:
(1) 82-bit fp registers can be spilled to memory in `register-file
format', as 128-bit values, with the data in the lower 82 bits.
(2) The reasons for the 82-bit format are to have IEEE 754 d.p. with 2
extra exponent bits, so that:
(i) divide and square root are implemented in software; the extra
exponent range prevents underflow and overflow in their
intermediate steps;
(ii) x^2 + y^2 can be computed without underflow or overflow
(iii) products of all double-extended denormalized numbers can be
represented.
There are two other articles in the journal that may be of interest:
aArticle{Cornea-Hasegan:1998:PIC,
author = "Marius Cornea-Hasegan",
title = "Proving the {IEEE} Correctness of Iterative
Floating-Point Square Root, Divide, and Remainder
Algorithms",
journal = j-INTEL-TECH-J,
number = "Q2",
pages = "11",
year = "1998",
bibdate = "Fri Jun 01 06:02:08 2001",
URL = "http://developer.intel.com/technology/itj/q21998/articles/art_3.htm;
http://developer.intel.com/technology/itj/q21998/pdf/ieee.pdf",
acknowledgement = ack-nhfb,
}
aArticle{OLeary:1999:FVI,
author = "John O'Leary and Xudong Zhao and Rob Gerth and
Carl-Johan H. Seger",
title = "Formally Verifying {IEEE} Compliance of Floating-Point
Hardware",
journal = j-INTEL-TECH-J,
number = "Q1",
pages = "10",
day = "17",
month = feb,
year = "1999",
bibdate = "Fri Jun 01 06:02:08 2001",
URL = "http://developer.intel.com/technology/itj/q11999/articles/art_5.htm;
http://developer.intel.com/technology/itj/q11999/pdf/floating_point.pdf",
acknowledgement = ack-nhfb,
}
The Cornea-Hasegan:1998:PIC discusses how to obtain IEEE 754
good-to-the-last-half-bit results for iterative approximations used in
software.
Despite this progress, all is not entirely rosy: I recently cited this
quote
>> ...
>> While correctly rounded implementations are ideal, they are
>> unattainable at present within practical speed and resource
>> limits. \ldots{} Therefore, a worst-case error below 0.6
>> ulps is an improvement when compared to that of 1 ulp in the
>> Pentium$^{\textsc{tm}}$ generation.
>> ...
about actual IA-64 and x86 chip implementations from
aInProceedings{Story:1999:NAI,
author = "S. Story and P. T. P. Tang",
title = "New Algorithms for Improved Transcendental Functions
on {IA-64}",
crossref = "Koren:1999:ISC",
pages = "4--11",
year = "1999",
bibdate = "Mon Feb 7 07:28:26 MST 2000",
URL = "http://euler.ecs.umass.edu/paper/final/paper-118.ps;
http://euler.ecs.umass.edu/paper/final/paper-118.pdf",
acknowledgement = ack-nhfb,
keywords = "computer arithmetic; ARITH; IEEE",
}
aProceedings{Koren:1999:ISC,
editor = "Israel Koren and Peter Kornerup",
booktitle = "14th IEEE Symposium on Computer Arithmetic:
proceedings: April 14--16, 1999, Adelaide, Australia",
title = "14th {IEEE} Symposium on Computer Arithmetic:
proceedings: April 14--16, 1999, Adelaide, Australia",
publisher = pub-IEEE,
address = pub-IEEE:adr,
pages = "xi + 274",
year = "1999",
ISBN = "0-7803-5609-8, 0-7695-0116-8, 0-7695-0118-4",
ISSN = "1063-6889",
LCCN = "QA76.6 .S887 1999",
bibdate = "Mon Feb 7 07:28:26 MST 2000",
note = "IEEE Computer Society Order Number PR00116. IEEE Order
Plan Catalog Number 99CB36336.",
URL = "http://computer.org/conferen/home/arith/;
http://www.ecs.umass.edu/ece/arith14/program.html",
acknowledgement = ack-nhfb,
annote = "Also known as ARITH-14.",
source = "Computer arithmetic",
sponsor = "IEEE.",
}
In addition, updating of fparith.bib this morning turned up more
papers about the very difficult problem of verifying floating-point
algorithms and the correctness of their mapping to chip circuits:
aInProceedings{Clarke:1996:WLS,
author = "E. M. Clarke and M. Khaira and X. Zhao",
editor = "IEEE",
booktitle = "33rd Design Automation Conference: proceedings 1996,
Las Vegas Convention Center, Las Vegas, NV, June 3--7,
1996",
title = "Word-level symbolic model checking: avoiding the
{Pentium FDIV} error",
publisher = pub-IEEE,
address = pub-IEEE:adr,
pages = "645--648",
year = "1996",
ISBN = "0-7803-3294-6 (casebound), 0-7803-3364-0 (softbound),
0-7803-3295-4 (microfiche), 0-89791-779-0 (ACM)",
LCCN = "TA174 .D46 1996",
bibdate = "Sat Jun 02 07:44:08 2001",
note = "ACM order number 47796. IEEE catalog number
96CH35932.",
URL = "http://www.acm.org/pubs/contents/proceedings/dac/240518/;
http://www.acm.org/pubs/citations/proceedings/dac/240518/p645-clarke/;
http://www.acm.org/pubs/articles/proceedings/dac/240518/p645-clarke/p645-clarke.pdf",
acknowledgement = ack-nhfb,
bookpages = "xxx + 839",
}
aArticle{Moore:1998:MCP,
author = "J. S. Moore and T. W. Lynch and M. Kaufmann",
title = "A mechanically checked proof of the {AMDK586}
floating-point division program",
journal = j-IEEE-TRANS-COMPUT,
volume = "47",
number = "9",
pages = "913--926",
month = sep,
year = "1998",
CODEN = "ITCOB4",
ISSN = "0018-9340",
bibdate = "Sat Jun 02 07:48:02 2001",
acknowledgement = ack-nhfb,
}
aArticle{Russinoff:1999:MCP,
author = "David M. Russinoff",
title = "A mechanically checked proof of correctness of the
{AMD K5} floating point square root microcode",
journal = j-FORM-METHODS-SYST-DES,
volume = "14",
number = "1",
month = jan,
year = "1999",
CODEN = "FMSDE6",
ISSN = "0925-9856",
bibdate = "Sat Jun 02 07:51:51 2001",
note = "Special issue on arithmetic circuits.",
URL = "http://www.wkap.nl/jrnltoc.htm/0925-9856;
http://www.wkap.nl/oasis.htm/194808",
acknowledgement = ack-nhfb,
}
aArticle{Even:2000:CTR,
author = "G. Even and P.-M. Seidel",
title = "A Comparison of Three Rounding Algorithms for {IEEE}
Floating-Point Multiplication",
journal = j-IEEE-TRANS-COMPUT,
volume = "49",
number = "7",
pages = "638--650",
year = "2000",
CODEN = "ITCOB4",
ISSN = "0018-9340",
bibdate = "Sat Jun 02 08:17:12 2001",
acknowledgement = ack-nhfb,
}
Finally, here are a couple of recent papers on the floating-point
vector sum problem:
aInProceedings{Tsuji:1995:ASF,
author = "K. Tsuji",
editor = "D. (Dimitur) Bainov and Valery Covachev",
booktitle = "Proceedings of the Third International Colloquium on
Numerical Analysis: Plovdiv, Bulgaria, 13--17 August
1994",
title = "An Algorithm for Sum of Floating-Point Numbers without
Round-Off Error",
publisher = "VSP",
address = "Utrecht, The Netherlands",
pages = "181--190",
year = "1995",
ISBN = "90-6764-193-6",
LCCN = "QA297.I45 1994",
bibdate = "Sat Jun 02 08:14:13 2001",
acknowledgement = ack-nhfb,
bookpages = "vi + 229",
}
aArticle{Kao:1999:LTA,
author = "Ming-Yang Kao and Jie Wang",
title = "Linear-Time Approximation Algorithms for Computing
Numerical Summation with Provably Small Errors",
journal = j-SIAM-J-COMPUT,
volume = "29",
number = "5",
pages = "1568--1576",
day = "12",
month = jul,
year = "1999",
CODEN = "SMJCAT",
ISSN = "0097-5397 (print), 1095-7111 (electronic)",
bibdate = "Fri May 25 06:06:40 2001",
bibsource = "http://epubs.siam.org/sam-bin/dbq/toclist/SICOMP/29/5",
URL = "http://epubs.siam.org/sam-bin/dbq/article/34159",
acknowledgement = ack-nhfb,
}
and on the elementary functions:
aArticle{Harrison:1999:CTF,
author = "John Harrison and Ted Kubaska and Shane Story and Ping
Tak Peter Tang",
title = "The Computation of Transcendental Functions on the
{IA-64} Architecture",
number = "Q4",
pages = "7",
day = "22",
month = nov,
year = "1999",
bibdate = "Fri Jun 01 06:02:08 2001",
URL = "http://developer.intel.com/technology/itj/q41999/articles/art_5.htm;
http://developer.intel.com/technology/itj/q41999/pdf/transendental.pdf",
acknowledgement = ack-nhfb,
}
aTechReport{Harrison:2000:HOM,
author = "John Harrison and Ted Kubaska and Bob Norin and Shane
Story and Ping Tak Peter Tang",
title = "Highly Optimized Mathematical Functions for the
{IA-64} Architectures",
type = "Technical report",
number = "245410-002",
institution = "Intel Corporation",
month = apr,
year = "2000",
bibdate = "Sat Jun 02 11:04:28 2001",
URL = "ftp://download.intel.com/design/IA-64/Downloads/libm.pdf",
acknowledgement = ack-nhfb,
}
These last two are of interest because the authors address the problem
of mapping polynomial approximations onto microparallel floating-point
hardware, and show that it is better and faster to compute two short
polynomials at once, than one long one. I suspect their observations
apply to other current RISC architectures as well, but I don't recall
ever seeing this point raised in any publication.
A publication history of elementary and transcendental function
evaluation is recorded here:
ftp://ftp.math.utah.edu/pub/tex/bib/elefunt.*
http://www.math.utah.edu/pub/tex/bib/index-table-e.html#elefunt
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- Center for Scientific Computing FAX: +1 801 585 1640, +1 801 581 4148 -
- University of Utah Internet e-mail: beebeamath.utah.edu -
- Department of Mathematics, 322 INSCC beebeaacm.org beebeacomputer.org -
- 155 S 1400 E RM 233 beebeaieee.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe -
-------------------------------------------------------------------------------
More information about the Numeric-interest
mailing list