[Cfp-interest 2150] Re: supernormal numbers (was: WG14 IEEE 754-C binding meeting minutes) 2021/08/17

Fri Sep 24 02:17:23 PDT 2021

On 2021-09-13 16:43:25 -0700, Jim Thomas wrote:
> > On Aug 18, 2021, at 8:25 AM, Vincent Lefevre <vincent at vinc17.net> wrote:
> > 
> > On 2021-08-17 14:02:43 -0500, Rajan Bhakta wrote:
> >>    Number classification and normal numbers (See CFP2091-3, CFP2096).
> > [...]
> >>      Fred: There is also supernormal (double double has it). Do you know
> >> if DBL_MAX + DBL_MAX is a finite number instead of an infinity.
> > 
> > Because the absolute value of the second component of a double-double
> > number must be less than or equal to 1/2 ulp of the first component,
> 
> Does this mean all arithmetic operations preserve the property?

The "IBM long double" spec in GCC is here:

  https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/rs6000/ibm-ldouble-format;hb=HEAD

In particular: "The most significant part is required to be
the value of the long double rounded to the nearest double,
as specified by IEEE."

As a consequence, the absolute value of the second component
must be less than or equal to 1/2 ulp of the first component
(the spec is even stricter).

The spec also says about this requirement:

  Many possible long double bit patterns are not valid long doubles.
  These do not represent any value.

> Are the operations undefined for other inputs?

I think that this is a consequence of the above requirement,
made explicit by "These do not represent any value."

In practice, for other inputs, they would be likely to be less
accurate, but it can be worse.

The requirement is important for comparisons, as the spec says:
"Two long doubles can be compared by comparing the high parts,
and if those compare equal, comparing the low parts."

This means that concerning long double values representing
midpoints of two consecutive double numbers, the rounding rule
for ties must be fixed (basically, round ties to even).

--------

Moreover, note that the spec has a section "Classification" that
defines "normal" and "subnormal" based on the FP model only, the
other "long double" values being neither normal, nor subnormal.
But this does *not* match the implementation. I recall that GCC
assumes the following definition in gcc/builtins.c:

        /* isnormal(x) -> isgreaterequal(fabs(x),DBL_MIN) &
           islessequal(fabs(x),DBL_MAX).  */

The following is a test I had posted to comp.std.c:

Here's a test program for long double, which shows the issue
on PowerPC. Here, 1 + 2^(-120) is exactly representable as a
double-double number (the exact sum of 1 and 2^(-120), which
are both double-precision numbers). This is verified by the
output of x - 1, which gives 2^(-120) instead of 0.

------------------------------------------------------------------
#include <stdio.h>
#include <float.h>
#include <math.h>

int main (void)
{
  volatile long double x = 1 + 0x1.p-120L;
  int c;

  printf ("LDBL_MANT_DIG = %d\n", (int) LDBL_MANT_DIG);
  printf ("x - 1 = %La\n", x - 1);

  c = fpclassify (x);
  printf ("fpclassify(x) = %s\n",
          c == FP_NAN ? "FP_NAN" :
          c == FP_INFINITE ? "FP_INFINITE" :
          c == FP_ZERO ? "FP_ZERO" :
          c == FP_SUBNORMAL ? "FP_SUBNORMAL" :
          c == FP_NORMAL ? "FP_NORMAL" : "unknown");

  printf ("isfinite/isnormal/isnan/isinf(x) = %d/%d/%d/%d\n",
          isfinite (x), isnormal (x), isnan (x), isinf (x));

  return 0;
}
------------------------------------------------------------------

I get the following result:

LDBL_MANT_DIG = 106
x - 1 = 0x1p-120
fpclassify(x) = FP_NORMAL
isfinite/isnormal/isnan/isinf(x) = 1/1/0/0

So x is a number with more than the 106-bit precision of normal
numbers, and both fpclassify(x) and isnormal(x) regard it as a
"normal number", which should actually be interpreted as being
in the range of the normal numbers.

I've just reported a bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102475

saying at the end that the current definition of "normal" in
libgcc/config/rs6000/ibm-ldouble-format should be regarded as
incorrect.

> The double-double described in 
> 
> http://mrob.com/pub/math/f161.html <http://mrob.com/pub/math/f161.html>
> 
> claims the precision is 107, with the extra bit coming from the way
> the sign of the low component is used.

As I've said, this is not a spec. See above for the real spec.

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)