No subject

Fri Mar 22 12:41:18 PST 1991

Subject:  Representation of floating-point constants

The following request for interpretation has been submitted to the ANSI
C committee and may be of interest to the numerics group.  This question
came from the NCEG meeting held March 4-5, 1991.

May floating-point constants be represented with more precision than
implied by it type?  Consider the following code fragment:

     float  f ;
     double d ;
     long double ld ;
     ld = ld + 0.1 ;         /* add a long double and a double */
     ld = ld + 1.0 / 10.0 ;  /* expression with "same" value */
      d =  f + 0.1f ;        /* "+" is allowed to be double precision */

In the above example, the decimal number 0.1, when converted to binary,
is a non-terminating repeating binary number; so the more bits used to
represent the number, the closer it will be to its true value.  Hence,
if doubles are 64-bits and long doubles are 80-bits, the long double
will be more accurate.  So in essence, may 0.1 ( a double ) be
represented with more precision, e. g., as 0.1L ( a long double )?

Parts of the ANSI X3.159-1989 standard that may help answer the question
follow.

Section 2.1.2.3, Program Execution, page 8, line 36:  "In the abstract
machine, all expressions are evaluated as specified by the semantics."

I believe that this is the "as if" rule that applies to this case.

Section 2.1.2.3, Program Execution, page 9, lines 44-45:
"Alternatively, an operation involving only INTs or FLOATs may be
executed using double-precision operations if neither range nor
precision is lost thereby."

Clearly, d =  f + 0.1f ; may be done using a double-precision add.  But,
may 0.1f be represented as the double 0.1?

Section 3.1.3.1, Floating Constants, page 27, lines 32-35:  "If the
scaled value is in the range of representable values (for its type) the
result is either the nearest representable value, or the larger or
smaller representable value immediately adjacent to the nearest
representable value, chosen in an implementation-defined manner."

I believe that the above does not require that the result be the nearest
representable value (for its type).

Section 3.2.1.5, Usual Arithmetic Conversions, page 36, lines 38-39:
"The values of floating operands and of the results of floating
expressions may be represented in greater precision and range than that
required by the type; the types are not changed thereby."

I believe that a floating constant is a floating operand, so is allowed
greater precision.  Clearly, the expression 1.0 / 10.0 is allowed
greater precision than just double, so it would make sense to allow an
equivalent constant ( 0.1 ) to have greater precision.

Section 3.4, Constant Expressions, page 56, lines 14-16:  "If a floating
expression is evaluated in the translation environment, the arithmetic
precision and range shall be at least as great as if the expression were
being evaluated in the execution environment."

From:     Fred Tydeman, IBM's NCEG representative
          Mail Stop 35A
          1510 Page Mill Road
          Palo Alto, California 94304
          Internet: tydemanaibmpa.awdpa.ibm.com
          UUCP: uunet!ibmsupt!tydeman
          (415) 855-4430