[Cfp-interest 2722] Re: definition of "floating types"

Thu Mar 2 08:01:30 PST 2023

On 2023-03-02 16:17:27 +0100, Vincent Lefevre wrote:
> On 2023-03-01 20:58:57 -0800, Jim Thomas wrote:
> > On Mar 1, 2023, at 8:39 AM, Vincent Lefevre <vincent at vinc17.net> wrote:
> > > IMHO, the standard should require that the evaluation format
> > > associated with float be float_t, and similarly for double and
> > > long double.
> > > 
> > > Would this break actual implementations?
> > 
> > Maybe yes. In the past, there were implementations that evaluated
> > expressions in wider registers but (when they ran out of registers)
> > stored intermediate values into narrower storage formats.
> 
> However, a narrower storage format, e.g. float, could be regarded as
> the evaluation format (e.g. double), but with a reduced accuracy of
> the operation (as the accuracy is implementation-defined). With an
> x86 processor, this is similar to an operation performed on double
> but where the processor is configured to round in single precision
> (even though no float type is involved).

In practice, there is even an issue with FLT_EVAL_METHOD = 2:

#include <stdio.h>
#include <float.h>
#include <math.h>

int main (void)
{
  volatile double x = 1.0, y = 0x1p-55;

#if __STDC__ == 1 && __STDC_VERSION__ >= 199901 && defined(__STDC_IEC_559__)
  printf ("__STDC_IEC_559__ defined, FLT_EVAL_METHOD = %d\n",
          (int) FLT_EVAL_METHOD);
#endif

  printf ("d = %.17g\n", (double) (fma(x,x,y) - fmal(x,x,y)));

  return 0;
}

gives

__STDC_IEC_559__ defined, FLT_EVAL_METHOD = 2
d = -2.7755575615628914e-17

under Linux x86 with both GCC 12 and Clang 13, using the 32-bit ABI
(optimizations do not seem to matter), while I would expect d = 0.

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)