Towards finding "the worst" input cases
Tim Peters
uunet!ksr.com!tim
Wed Mar 13 23:09:41 PST 1991
If the topic doesn't interest you, or if you're looking for a neat
finished package <grin>, please ignore this; enough people seemed
curious enough to justify broadcasting it.
I found an analytical approach that looks very promising, but finishing
it off one way or another would take a bout of intense concentration
I just can't make the time for these days. If you can, I think you'll
find you can make this approach work.
Consider the 2^52 (^ = exponentiation; r#num = a number in base r)
binary numbers of the form
2#1.s1 * 2^-1022
where s is a string of 52 bits. I.e., these are the IEEE double
round-to-nearest end cases with (unbiased) exponent -1022 (in order to
get *the* worst IEEE double case, the obvious variants of the argument
below would need to be made for each possible binary exponent, or
(probably better) reversed to start with base 10 strings -- starting
with base 2 here instead is an historical accident I haven't had time to
recover from).
I want to find a number of that form that "comes closest" to being
representable in decimal scientific notation with a 17-or-less digit
signficand.
Let x = 2#1.s1 * 2^-1022 = (2#1s1 * 2^-53) * 2^-1022 =
2#1s1 * 2^-1075 = [multiplying by 1 = 10^324/10^324]
(2#1s1 * 2^-1075 * 10^324) * 10^-324 = [calling (...) "y"]
y * 10^-324
So the decimal representation of x is the decimal representation of y
with an "e-324" tacked on the end, and we'll ignore that scale factor
from here on, concentrating on the
y = 2#1s1 * 2^-1075 * 10^324 = [expressing it as a rational]
= (2#1s1 * 10^324) / 2^1075
part. Now the decimal expansion of y contains exactly 17 digits before
the decimal point (10^324/2^1075 ~= 2.47, and 9.01e15 ~= 2^53 < 2#1s1 <
2^54 ~= 1.80e16, so 2.23e16 < y < 4.45e16, and the bounds on each end
have 17 digits before the decimal point). Hence the 17 digits of the
decimal significand are accounted for by the integer portion of y, so
the value of s that leads to a number "closest to" being representable
in a 17-digit decimal scientific format is a value of s that makes y
closest to an exact integer.
Let's simplify a little more:
y = (2#1s1 * 10^324) / 2^1075 =
(2#1s1 * 5^324 * 2^324) / 2^1075 =
(2#1s1 * 5^324) / 2^751 =
((2^53 + 2*s + 1) * 5^324) / 2^751 =
[ ((2^53+1) + 2*s) * 5^324 ] / 2^751 = [just naming stuff]
top / c
Clearly, the value of y closest to an integer corresponds to the value
of s that makes (top mod c) closest to 0 (where "closest" is to be taken
in the modular sense -- e.g., if we're working mod 17, 15 and 2 are
equally close to 0, and 16 is closer to 0 than either).
Then
top mod c =
[ ((2^53+1) + 2*s) * 5^324 ] mod c =
[ ((2^53+1)*5^324) mod c + (2*5^324*s) mod c ] mod c =
[ ((2^53+1)*5^324) mod c + ((2*5^324) mod c)*s ] mod c =
(a + b*s) mod c
where
a = ((2^53+1)*5^325) mod c
and
b = (2*5^324) mod c
are fixed.
So all we need to do to get a "worst case" is to find a value of s in
[0,2^52-1] that makes (a+b*s) mod c closest to 0.
Exercise 42 in section 4.5.3 of Knuth Vol 2 (2nd edition) details how to
find the integers i that make x*i closest to 0, mod 1 (where x is an
*irrational* number), based on analyzing the continued fraction
representation of x. By standing on your head for a while it's easy to
see how to twist that towards finding the s that make (b*s) mod c
closest to 0, based on analyzing the continued fraction representation
of b/c. This is efficient for even huge b & c, since the work is
proportional to the number of partial quotients in the (finite)
continued fraction representation of b/c, and that number is never
larger than about 5*log10(c). Accounting for the offset by "a" in the
formula we actually want to minimize seems surprisingly troublesome, and
dealing with the limits on s is no picnic either. It could be that the
deeper theory developed in the solution to section 6.4's (Knuth, Vol 3,
1st ed) exercise 8 is needed to nail it (this is the proof of the
"three-distance theorem", 6.4S). Note that some difficulties also arise
when (as in the specific case we're looking at) gcd(b,c) > 1, but they
seem comparatively minor.
In any case, I played around in odd moments writing a "hill-climbing"
algorithm that used the continued fraction represenation of b/c (in the
manner alluded to above) to try to move f(s) = (a+b*s) mod c closer to
0, starting with s=0 and making the smallest increments to s that lead
to new "record-breaking" (Knuth's terminology) values. I'm sure my
program is doing some things wrong (& don't have time to out-think it),
but it's also doing some things right: in a fraction of a second it
found an s leading to a test case that's dramatically worse than
anything various educated "random" test programs found after days of
run-time:
For c = 2^751, b = 2*5^324 mod c, a = (2^53+1)*5^325 mod c, trying to
find an s that makes (a+b*s) mod c closest to 0, and with s constrained
to be in [0,2^52-1], it cranked out s=2161889134695901 as its 20th (and
final) "record breaker". Embedding that in the binary format described
above yields the round-to-nearest end case
2#1.01111010111000111010000010011110101011010101110111011 *2^-1022
which is, to 35 decimal digits,
10#3.2931890100779055999999999999999990e-308 (i.e., a+b*s mod c was
"close to 0" here by virtue of being very large (close to c)).
Rounding that to a 17-digit decimal input case yields
3.2931890100779056e-308
whose binary representation is (rounded to 120 bits, first line gives
the binary scale factor, second line the leading 53 bits, third line the
excess):
2*^-1022 *
2#1.0111101011100011101000001001111010101101010111011101
1000000000000000000000000000000000000000000000000000000000100100001
Among other things, this shows that even a careful "doubled precision"
approach isn't good enough to do correct 17-digit decimal -> IEEE double
input, since the crucial "which side of a half ulp is it on?" isn't
resolved here until hitting the 59th excess bit.
If anyone carries this approach to a satisfactory <grin> conclusion, I
would of course be delighted to hear about it.
no-you-don't-want-the-program-it's-undocumented-and-needs-the-Icon-
version-8-with-optional-bignums-interpreter-ly y'rs - tim
Tim Peters Kendall Square Research Corp
timaksr.com, ksr!timaharvard.harvard.edu
More information about the Numeric-interest
mailing list