[Cfp-interest 2108] supernormal numbers (was: WG14 IEEE 754-C binding meeting minutes) 2021/08/17
Vincent Lefevre
vincent at vinc17.net
Wed Aug 18 08:25:43 PDT 2021
On 2021-08-17 14:02:43 -0500, Rajan Bhakta wrote:
> Number classification and normal numbers (See CFP2091-3, CFP2096).
[...]
> Fred: There is also supernormal (double double has it). Do you know
> if DBL_MAX + DBL_MAX is a finite number instead of an infinity.
Because the absolute value of the second component of a double-double
number must be less than or equal to 1/2 ulp of the first component,
DBL_MAX + DBL_MAX is an invalid representation (trap representation
in the C terminology).
However, due to a representation issue with the maximum exponent,
the maximum representable finite floating-point number LDBL_MAX is
strictly larger than the maximum normalized floating-point number
(which is close to DBL_MAX/2).
With GCC on PowerPC (double-double), where the precision is 106,
LDBL_MAX = 0x1.fffffffffffff7ffffffffffff8p+1023
which is
0x1.fffffffffffffp+1023
+ 0x0.00000000000007ffffffffffff8p+1023
though I would rather expect DBL_MAX + DBL_MAX * DBL_EPSILON / 4
= 0x1.fffffffffffff7ffffffffffffcp+1023, i.e. with an additional
trailing 1. I don't see why this would be a trap representation.
However, the maximum normalized floating-point number is
0x1.ffffffffffffffffffffffffff8p+1022, or equivalently,
0x0.ffffffffffffffffffffffffffcp+1023, which is
0x1.0000000000000p+1023
- 0x0.000000000000000000000000004p+1023
There's another issue:
FLT_MAX_EXP
DBL_MAX_EXP
LDBL_MAX_EXP
are defined in th current C2x draft (N2596) as
maximum integer such that FLT_RADIX raised to one less than that
power is a representable finite floating-point number, e_max
while e_max was first introduced in the floating-point model, i.e.
for *normalized* numbers. A solution would be to introduce
FLT_NORM_MAX_EXP
DBL_NORM_MAX_EXP
LDBL_NORM_MAX_EXP
which would follow the floating-point model, and
FLT_MAX_EXP
DBL_MAX_EXP
LDBL_MAX_EXP
which would allow potentially larger values, possibly with a relaxed
definition (the current one is OK when there are no supernormal numbers,
but may be artificial otherwise).
FYI, my original GCC bug report (submitted before the defect report,
thus be careful with some obsolete discussion):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399
--
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
More information about the Cfp-interest
mailing list