[Cfp-interest 2159] Re: supernormal numbers (was: WG14 IEEE 754-C binding meeting minutes) 2021/08/17

Jim Thomas jaswthomas at sbcglobal.net
Tue Sep 28 15:20:57 PDT 2021


Vincent, thank you the detailed explanations.
- Jim Thomas

> On Sep 24, 2021, at 2:17 AM, Vincent Lefevre <vincent at vinc17.net> wrote:
> 
> On 2021-09-13 16:43:25 -0700, Jim Thomas wrote:
>>> On Aug 18, 2021, at 8:25 AM, Vincent Lefevre <vincent at vinc17.net> wrote:
>>> 
>>> On 2021-08-17 14:02:43 -0500, Rajan Bhakta wrote:
>>>>   Number classification and normal numbers (See CFP2091-3, CFP2096).
>>> [...]
>>>>     Fred: There is also supernormal (double double has it). Do you know
>>>> if DBL_MAX + DBL_MAX is a finite number instead of an infinity.
>>> 
>>> Because the absolute value of the second component of a double-double
>>> number must be less than or equal to 1/2 ulp of the first component,
>> 
>> Does this mean all arithmetic operations preserve the property?
> 
> The "IBM long double" spec in GCC is here:
> 
>  https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/rs6000/ibm-ldouble-format;hb=HEAD
> 
> In particular: "The most significant part is required to be
> the value of the long double rounded to the nearest double,
> as specified by IEEE."
> 
> As a consequence, the absolute value of the second component
> must be less than or equal to 1/2 ulp of the first component
> (the spec is even stricter).
> 
> The spec also says about this requirement:
> 
>  Many possible long double bit patterns are not valid long doubles.
>  These do not represent any value.
> 
>> Are the operations undefined for other inputs?
> 
> I think that this is a consequence of the above requirement,
> made explicit by "These do not represent any value."
> 
> In practice, for other inputs, they would be likely to be less
> accurate, but it can be worse.
> 
> The requirement is important for comparisons, as the spec says:
> "Two long doubles can be compared by comparing the high parts,
> and if those compare equal, comparing the low parts."
> 
> This means that concerning long double values representing
> midpoints of two consecutive double numbers, the rounding rule
> for ties must be fixed (basically, round ties to even).
> 
> --------
> 
> Moreover, note that the spec has a section "Classification" that
> defines "normal" and "subnormal" based on the FP model only, the
> other "long double" values being neither normal, nor subnormal.
> But this does *not* match the implementation. I recall that GCC
> assumes the following definition in gcc/builtins.c:
> 
>        /* isnormal(x) -> isgreaterequal(fabs(x),DBL_MIN) &
>           islessequal(fabs(x),DBL_MAX).  */
> 
> The following is a test I had posted to comp.std.c:
> 
> Here's a test program for long double, which shows the issue
> on PowerPC. Here, 1 + 2^(-120) is exactly representable as a
> double-double number (the exact sum of 1 and 2^(-120), which
> are both double-precision numbers). This is verified by the
> output of x - 1, which gives 2^(-120) instead of 0.
> 
> ------------------------------------------------------------------
> #include <stdio.h>
> #include <float.h>
> #include <math.h>
> 
> int main (void)
> {
>  volatile long double x = 1 + 0x1.p-120L;
>  int c;
> 
>  printf ("LDBL_MANT_DIG = %d\n", (int) LDBL_MANT_DIG);
>  printf ("x - 1 = %La\n", x - 1);
> 
>  c = fpclassify (x);
>  printf ("fpclassify(x) = %s\n",
>          c == FP_NAN ? "FP_NAN" :
>          c == FP_INFINITE ? "FP_INFINITE" :
>          c == FP_ZERO ? "FP_ZERO" :
>          c == FP_SUBNORMAL ? "FP_SUBNORMAL" :
>          c == FP_NORMAL ? "FP_NORMAL" : "unknown");
> 
>  printf ("isfinite/isnormal/isnan/isinf(x) = %d/%d/%d/%d\n",
>          isfinite (x), isnormal (x), isnan (x), isinf (x));
> 
>  return 0;
> }
> ------------------------------------------------------------------
> 
> I get the following result:
> 
> LDBL_MANT_DIG = 106
> x - 1 = 0x1p-120
> fpclassify(x) = FP_NORMAL
> isfinite/isnormal/isnan/isinf(x) = 1/1/0/0
> 
> So x is a number with more than the 106-bit precision of normal
> numbers, and both fpclassify(x) and isnormal(x) regard it as a
> "normal number", which should actually be interpreted as being
> in the range of the normal numbers.
> 
> I've just reported a bug:
> 
>  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102475
> 
> saying at the end that the current definition of "normal" in
> libgcc/config/rs6000/ibm-ldouble-format should be regarded as
> incorrect.
> 
>> The double-double described in 
>> 
>> http://mrob.com/pub/math/f161.html <http://mrob.com/pub/math/f161.html>
>> 
>> claims the precision is 107, with the extra bit coming from the way
>> the sign of the low component is used.
> 
> As I've said, this is not a spec. See above for the real spec.
> 
> -- 
> Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
> _______________________________________________
> Cfp-interest mailing list
> Cfp-interest at oakapple.net
> http://mailman.oakapple.net/mailman/listinfo/cfp-interest




More information about the Cfp-interest mailing list