Correctly-rounded base conversion effects on code size and speed
David Hough
sun!Eng!dgh
Tue Jun 12 09:57:52 PDT 1990
To quantify the results of my base conversion work,
I prepared the following report for internal Sun distribution.
Since the code in question will probably be publicly available
after I put it into a form that is mostly
portable among IEEE 754 implementations and less so among others,
the report seems suitable for distribution to this mailing list.
Note that the results for SunOS 4.1, Fortran 1.[34], and C 1.[01]
were obtained with early internal releases that may differ from
what was finally shipped.
---------------------------------------------------------------------------
This report describes recent improvements in floating-point base conversion
performance in Sun software. While base conversion does not dominate most
programs, nearly all application codes are affected by accuracy and performance
of base conversion. Increasing the accuracy of base conversion reduces concerns
about getting "different" answers on different Sun platforms (thus reducing
support problems) and moves closer to the vision of distributed computing.
Increasing the performance of base conversion reduces complaints about poor
performance in general. Users do not often think about base conversion, but
they do notice when it is wrong, or slow.
Who Cares About Base Conversion?
When you want to read in a number in conventional
decimal notation and store it in internal binary format, or
when conversely you want to print an internal binary value
as an ASCII string of decimal digits, you need to do a
conversion between number representations in bases two and
ten. Base conversion is not too difficult for integer for-
mats but gets more complicated when floating point is
involved; as with other floating-point operations, rounding
errors occur. For instance, on conversion from internal
floating point to an external decimal representation, the
mathematical problem to be solved in default rounding mode
is
Given integers i and e, find integer(s)
(for printf %f or Fortran F format with n digits after point)
I such that
| i * 2**e * 10**n - I | <= 0.5
(for printf %e or Fortran E format with n significant digits)
I and E such that
| i * 2**e - I * 10**E | <= 0.5 * 10**E
AND
10**(n-1) <= I <= (10**n) - 1
always distinguishing exact conversions
always rounding inexact halfway cases to nearest even
Of course the integers I may not fit in a conventional C int
or long.
Since multiple operations occur, so do multiple round-
ing errors if these problems are solved in a straightforward
way using floating-point arithmetic. These rounding errors
obscure the floating-point value you're trying to display.
Worse, different common algorithms have different rounding
errors so the same program with the same data produces dif-
ferent results on different systems, even for different
operating systems on the same hardware. This is a problem
for network computing:
Persons doing scientific and engineering calculations
aren't comfortable with network computing that gets dif-
ferent results on different systems, especially if all sys-
tems nominally conform to IEEE 754. IEEE 754 requires
correct rounding for normal numbers, whose magnitudes range
-27 +27
from 10 to 10 , but permits slightly incorrect rounding
for larger exponents. Even so most implementations (like
SunOS 3.5) get it almost right, but not quite, and so get
different results, even for normal exponents. The work
reported below is intended to eliminate this particular
source of gratuitous differences. Although some customers
complained when upgrading from 3.5 to 4.0 caused "different"
answers to appear for the same computation, I think in the
long term they will be happier knowing that the answers
won't change again because now they are correct.
The only reasonable uniformity goal is correct round-
ing, which is to say that each conversion operation suffers
at most one rounding error which acts just like rounding
errors for correctly-implemented IEEE multiplication or
division. Although the basic ideas of correctly-rounded
base conversion and of efficient base conversion are well
documented in Coonen's thesis and Sterbenz' book, fast
correctly-rounded implementations don't seem to be common.
To be fully effective, therefore, the machine-independent
parts of the resulting source code eventually should be
available to other vendors in the same way that NFS was pro-
pagated. Thus Sun will draw a target around our bullet hole
and invite others to hit it.
Other kinds of discrepancies that inhibit network com-
puting include extended precision expression evaluation and
elementary transcendental functions. So there are plenty of
interesting projects here for the future.
Who Cares About Base Conversion Performance?
Nobody, I thought - surely the work of computing the
answer and printing it greatly exceeds the work of format-
ting it for printing. But spreadsheet users and students in
elementary Fortran classes often spend more time in conver-
sion from internal binary floating-point form to ASCII
strings for output than anywhere else. Even some customers
with presumably more sophisticated applications noticed a
performance degradation in SunOS 4.0; the inner loop of one
customer application consists of formatted Fortran write
statements to intermediate files that they process on other
systems. They couldn't use unformatted files because those
other systems were not IEEE systems.
SunOS Implementations
Conversion between floating-point binary formats and
ASCII strings has been extensively revised in recent SunOS
releases, first to obtain "correctly rounded" conversions
fully observing IEEE rounding modes and detecting IEEE
exceptions, and then to improve performance.
Prior to SunOS 3.0, conversion between ASCII strings
and IEEE single or double precision was accomplished by code
written in portable C which used floating-point arithmetic
and very slow, inaccurate algorithms. Since the available
floating-point hardware was the Sky FFP and the default code
generation was -fswitch, the results of your conversions
depended on whether a Sky board was installed.
In SunOS 3.x, Richard Tuck vastly improved speed and
accuracy by using extended precision multiplication and
tables of positive and negative powers of ten, all coded
using mc68010 assembler. The Sun-4 project forced me to
develop a much more portable approach to base conversion.
* The mc68010 software extended-precision multiplication
method failed to meet IEEE accuracy requirements
because tables of negative powers of ten were used
rather than an extended precision division subroutine.
* A major rewrite would have been required anyway for
such an approach to convert to and from 68881 extended
format or quadruple precision format.
* Rounding modes and exceptions were ignored.
* Different input scanners for floating-point numbers
were used in C and Fortran; neither was able to read
infinities or NaNs output by printf or write state-
ments.
* The native floating-point arithmetic hardware was not
used in floating-point base conversion because of
unreliability in the Sky FFP and A79J 68881.
In sys4-3.2 and SunOS 4.0, I implemented correctly-
rounded conversion - subject to a number of bugs discovered
later and subsequently fixed - by computing powers of ten
exactly, using multiple precision integer arithmetic, as
needed at run time. I added conversions to and from 68881
extended-precision floating-point format. Rounding modes
and exceptions were handled correctly. Performance was very
slow on numbers with large exponents, however, due to the
time required to compute large powers of ten. I developed a
unified scanner for Fortran and C that understands "Infin-
ity" and "NaN", so that all bugs would be shared (and there-
fore discovered sooner). Some people observed that signifi-
cant stack space was consumed because the largest buffers
that might be needed were always allocated on the stack -
+5000
for extended exponents as large as 10- .
In SunOS 4.1 beta, tables of selected small and large
powers of ten, expressed as variable-precision integers,
speed up conversion for large exponents. This led to
remarkable performance improvements in those cases, and
remarkable increases in size of staticly-linked executables.
In anticipation of an SVR4 ANSI-C compiler with a long dou-
ble type, quadruple-precision base conversion was added.
Large buffers were malloc'd and free'd as needed at run time
rather than allocated on the stack with a fixed size.
Unfortunately 4.1 froze before I had time to add performance
tuning enhancements like internal inline expansion tem-
plates.
For SPARCompilers 0.0 beta (Fortran 1.3 and C 1.0), I
focussed on performance of base conversion for single- and
double-precision floating-point numbers with normal
exponents. For instance, inline expansion templates were
used internally to speed up the inner loops in the
multiple-precision integer arithmetic.
For SPARCompilers 1.0 developmental (Fortran 1.4 and C
1.1), I decided to exploit floating-point hardware when
available to handle the easy cases that didn't require
multiple-precision integer arithmetic to round correctly.
This means that base-conversion speed now is a function of
code generation option. I reorganized the main tables of
powers of ten to consume about half as much space, at the
cost of about twice the performance on large exponents.
Other new tables also increased executable size and improved
performance by lesser amounts.
Test Programs
In the following list of test programs, "normal"
floating-point numbers have magnitudes ranging roughly from
-27 +27
10 - to 10 , while "large" floating-point numbers have
magnitudes uniformly selected from all those representable
in a floating-point format.
SP.timefio - /valid/timefio -DSP
100000 Fortran list-directed output normal floating-point single-precision
100000 Fortran list-directed input normal floating-point single-precision
100000 Fortran E-formatted output large floating-point single-precision
100000 Fortran E-formatted input large floating-point single-precision
100000 Fortran integer I-formatted output
100000 Fortran integer I-formatted input
DP.timefio - /valid/timefio -DDP
100000 Fortran list-directed output normal floating-point double-precision
100000 Fortran list-directed input normal floating-point double-precision
100000 Fortran E-formatted output large floating-point double-precision
100000 Fortran E-formatted input large floating-point double-precision
100000 Fortran integer list-directed output
100000 Fortran integer list-directed input
timestdio - /valid/timestdio
100000 stdio sprintf normal floating-point double-precision
100000 stdio sscanf normal floating-point double-precision
100000 stdio sprintf large floating-point double-precision
100000 stdio sscan large floating-point double-precision
100000 stdio sprintf integer
100000 stdio sscan integer
timextended - /valid/timextended
10000 decimal to extended (base.4.0 and later)
- -
10000 extended to decimal (base.4.0 and later)
- -
timequad - /valid/timequad
10000 stdio sscanf long double (quadruple-precision) (base.4.1 and later)
10000 stdio sprintf long double (quadruple-precision) (base.4.1 and later)
In addition, two short programs were linked staticly and the
code+data sizes measured in pages of 8192 bytes. hello.c
and filter.f:
main(){printf(" hello \n");}
read *,x
print *,x
end
Test Environments
Test programs were compiled and executed under the fol-
lowing environments:
base.1.4
SunOS 1.4 bundled C and Fortran compilers, executables
run on Sirius with SunOS 4.0 Thanks to Mitch Bradley
who compiled the test programs on his home machine,
Sun-2/50 #7.
base.3.5
SunOS 3.5, bundled Fortran, Sirius
base.4.0
SunOS 4.0, Fortran 1.1, Sirius or Sunrise
base.4.1
SunOS 4.0, SunOS 4.1 beta /lib/libc.a, Fortran 1.2,
Sirius or Sunrise
base.0.0
SunOS 4.0, SPARCompilers 0.0 beta, Sirius or Sunrise,
-u doprnt -u doscan
-- --
base.1.0
SunOS 4.0, SPARCompilers 1.0 developmental, Sirius
-ffpa, Sirius -f68881, or Sunrise FPU2 -cg89 -u
doprnt -u doscan
-- --
The following tables of Sun-3/4 results show execu-
tion times in seconds for these test programs.
Sun-3/260 Results
The following results are from a sirius with an
FPA. Times in seconds, code+data sizes in pages.
-fsoft -f68881 -ffpa
Test base.1.0 base.1.0 base.1.0 base.0.0 base.4.1 base.4.0 base.3.5 base.1.4
hello.c
code+data 6+1 7+1 8+1 8+1 4+6 2+1 2+1 2+1
filter.f
code+data 13+2 13+2 15+2 14+2 9+6 8+1 9+1 5+1
SP.timefio
write normal 63 57 49 68 96 66 41 253
read normal 67 50 42 77 96 70 40 255
write big 72 65 55 76 114 74 38 375
read big 73 54 59 85 105 85 30 290
write int 35 35 35 35 37 32 29 61
read int 29 29 29 29 30 26 25 46
DP.timefio
write normal 76 69 59 86 127 91 56 346
read normal 93 70 69 113 148 102 47 377
write big 107 113 110 94 154 451 51 2000
read big 113 119 124 109 141 453 37 1200
write int 31 30 30 30 32 32 29 62
read int 48 39 36 53 58 62 33 136
timestdio
sprintf normal 53 50 38 60 95 69 31 175
sscanf normal 70 36 33 83 112 68 32 108
sprintf big 83 92 86 69 121 388 34 2000
sscanf big 96 97 94 86 118 346 33 112
sprintf int 11.6 12.2 12.1 12.6 13.3 13.4 12.4 20.3
sscanf int 13.0 13.2 13.0 12.7 19.7 13.7 12.3 13.7
timextended
dec to bin 11.1-23.0 11.4-23.3 11.3-23.2 5.5-10.6 7.8-19.7 1700-3400 X X
bin to dec 13.7-25.6 13.7-25.6 13.6-25.6 9.8-12.1 20.7-28.5 2400-3200 X X
timequad
sscanf 11.9-25.2 12.1-25.5 12.3-25.4 6.3-12.6 9.9-23.9 X X X
sprintf 16.2-30.9 16.4-31.0 16.2-31.0 10.6-16.6 23.1-41.1 X X X
Sun-4/260 Results
The following results are from a sunrise with
FPU2.
-cg89
Test base.1.0 base.0.0 base.4.1 base.4.0
hello.c
code+data 9+1 10+1 6+6 3+2
filter.f
code+data 17+2 18+2 13+6 10+1
SP.timefio
write normal 23 44 59 43
read normal 17 40 46 50
write big 25 51 67 59
read big 25 45 49 81
write int 25 24 27 25
read int 14 14 15 13
DP.timefio
write normal 30 60 83 61
read normal 35 61 71 88
write big 72 71 96 1077
read big 74 65 67 1720
write int 23 25 23 23
read int 15 27 27 26
timestdio
sprintf normal 19 39 52 38
sscanf normal 14 44 53 39
sprintf big 56 51 73 1070
sscanf big 58 47 57 1191
sprintf int 7.4 9.3 9.5 9.3
sscanf int 6.1 5.9 9.2 5.9
timextended
dec to bin 8.4-27.3 3.7-8.4 4.0-11.3 2800-21000
bin to dec 10.6-26.0 9.9-12.3 11.4-17.5 3300-20000
timequad
sscanf 8.2-23.1 4.3-9.9 4.8-13.4 X
sprintf 12.6-31.8 12.6-27.3 10.6-16.6 X
More information about the Numeric-interest
mailing list