Numeric related Request For Interpretations to ANSI C

Fri Jan 4 07:45:54 PST 1991

The  following  are numeric  related  ANSI  C  issues that  have  been
submitted  to ANSI  C  for official  interpretations  that  may be  of
interest to this mailing list.

----------------------------------------------------------------------
Subject:  Formal Request For Interpretation Number 2

What is the result  of:  strtod( "100ergs", &ptr);?  Is it 100.0 or is
it 0.0?

ANSI  C X3.159-1989,  4.10.1.4 The  strtod Function,  page 151,  lines
36-38:   'The  subject  sequence is  defined  as  the longest  initial
subsequence   of  the   input   string,   starting  with   the   first
non-white-space character,  that is  of the expected  form.'   In this
case, the longest  initial subsequence of the expected  form is "100",
so 100.0 should be the return value.  Also, since the entire string is
in memory,  strtod can scan it  as many times  as need be to  find the
longest valid initial subsequence.

ANSI  C X3.159-1989,  4.9.6.2  The fscanf  Function,  page 137,  lines
17-18:   'e,f,g Matches  an optionally  signed floating-point  number,
whose format  is the same  as expected for  the subject string  of the
strtod function.'   Later,  page 139, lines  6, 16,  and 25  show that
'100ergs'  fails to  match "%f".   Those  two show  that '100ergs'  is
invalid to fscanf  and therefore, invalid to strtod.   Then, page 152,
lines 11-12  'If no conversion could  be performed, zero  is returned'
indicates for an error input, 0.0 should be returned.  The reason this
is  invalid  is  spelled  out in  the  Rationale, 4.9.6.2  The  fscanf
function,  page 95,  'One-character  pushback  is sufficient  for  the
implementation  of  fscanf.    Given  the  invalid  field  "-.x",  the
characters "-."  are not  pushed back.'   And later,  'The conversions
performed by fscanf are compatible with  those performed by strtod and
strtol.'

So, do  strtod and fscanf  act alike and both  accept and fail  on the
same inputs,  by the one-character  pushback scanning strategy,  or do
they use  different scanning  strategies and  strtod accept  more than
fscanf?

----------------------------------------------------------------------
Subject:  Formal Request For Interpretation Number 3

Assuming that 99999 is larger than  DBL_MAX_10_EXP, what is the result
of:  strtod( "0.0e99999", &ptr);?  Is it 0.0, HUGE_VAL, or undefined?

ANSI C X3.159-1989, 3.1.3.1 Floating  Constants, page 27, lines 30-32:
'The significand part is interpreted as a decimal rational number; the
digit  sequence in  the  exponent part  is  interpreted  as a  decimal
integer.    The exponent  indicates  the  power  of  10 by  which  the
significand part is to  be scaled.'  In this case  0.0e99999 means 0.0
times 10 to the power 99999, or 0.0  * 10 ** 99999, which has a scaled
value of 0.0; therefore, return 0.0

ANSI  C X3.159-1989,  4.10.1.4 The  strtod Function,  page 152,  lines
12-14:   'If the correct value  is outside the range  of representable
values, plus or  minus HUGE_VAL is returned (according to  the sign of
the value),  and the value  of the macro  ERANGE is stored  in errno'.
Since the exponent (99999 in this case) is larger than DBL_MAX_10_EXP,
the value  if outside  the range  of representable  values (overflow).
Therefore, return HUGE_VAL.

ANSI  C  X3.159-1989,  2.2.4.2.2  Characteristics  of  Floating  Types
<float.h>,  pages 15  and  16, describe  the  model  that defines  the
floating-point types.   The number 0.0e99999, as written,  is not part
of that model  (it cannot be represented since the  exponent is larger
than  e-max).   From 3.2.1.4  Floating  Types, page  36, lines  11-13,
'...if the value  being converted is outside the range  of values that
can be represented, the behavior is undefined.'  Therefore, since this
number, as written, has no representation, the behavior is undefined.

----------------------------------------------------------------------
Subject:  Formal Request For Interpretation Number 4

ANSI C  X3.159-1989, 4.10.1.4 The strtod  Function, page 152,  line 5:
What does '"C" locale' mean?
   a) setlocale(LC_ALL,NULL) == "C"
   b) setlocale(LC_NUMERIC,NULL) == "C"
   c) a) && b)
   d) a) || b)
   e) something else.

What does 'other than the "C" locale' mean?
   a) setlocale(LC_ALL,NULL) != "C"
   b) setlocale(LC_NUMERIC,NULL) != "C"
   c) a) && b)
   d) a) || b)
   e) something else.

Section 4.4.1 Locale Control, page 108 may help answer the questions.

----------------------------------------------------------------------
Subject:  Formal Request For Interpretation Number 5

What is meant by 'representable  floating-point value'?  Assume double
precision, unless stated otherwise.

First, some definitions based partially  upon the floating-point model
on pages 15-16 of ANSI C X3.159-1989:

1)  +Normal Numbers:  DBL_MIN to DBL_MAX, inclusive; normalized (first
    significand digit is non-zero), sign is +1.
2)  -Normal Numbers:  -DBL_MAX to -DBL_MIN, inclusive; normalized.
3)  +Zero:  All digits zero, sign is +1; (true zero).
4)  -Zero:  All digits zero, sign is -1.
5)  Zero:  Union of +zero and -zero.
6)  +Denormals:   Exponent  is "minimum"  (Biased  exponent is  zero);
    first significand digit is  zero; sign is +1.  These  are in range
    +DBL_DeN (inclusive) to +DBL_MIN (exclusive).   Let DBL_DeN be the
    symbol for the minimum denormal, so we can talk about it by name.
7)  -Denormals:    Same  as  +denormals, except  sign,  and  range  is
    -DBL_MIN (exclusive) to -DBL_DeN (inclusive).
8)  +Unnormals:  Biased exponent is  non-zero; first significand digit
    is zero;  sign is  +1.  These  overlap the  range of  +normals and
    +denormals.
9)  -Unnormals:    Same as  +unnormals,  except  sign; range  is  over
    -normals and -denormals.
10) +infinity:  From IEEE-754.
11) -infinity:  From IEEE-754.
12) Quiet NaN (Not a Number); sign does not matter; from IEEE-754.
13) Signaling NaN; sign does not matter; from IEEE-754.
14) NaN: Union of Quiet NaN and Signaling NaN.
15) Others:  Reserved (VAX?) and Indefinite (CDC/Cray?) act like NaN.

On the real number line, these symbols order as:

  [  1  )[   2    ](   3   ]( 4 )[ 5 ]( 6 )[   7   )[   8    ](  9  ]
  +------+--------+--------+-----+---+-----+--------+--------+------+
-INF -DBL_MAX -DBL_MIN -DBL_DeN -0  +0 +DBL_DeN +DBL_MIN +DBL_MAX +INF

Non-real numbers are:  SNaN, QNaN, and NaN, call this region 10.

Regions 1 and 9 are overflow, 2 and  8 are normal numbers, 3 and 7 are
denormal numbers (psuedo underflow), 4 and 6 are true underflow, and 5
is zero.

So,  the question  is:   What does  'representable (double  precision)
floating-point value' mean:

a) Regions 2, 5 and 8 (+/- normals and zero)
b) Regions 2, 3, 5, 7, and 8 (+/- normals, denormals, and zero)
c) Regions 2 through 8 [-DBL_MAX ... +DBL_MAX]
d) Regions 1 through 9 [-INF ... +INF]
e) Regions 1 through 10 (reals and non-reals)
f) What the hardware can represent
g) Something else?  What?

Some things to consider in  your answer  follow.   The questions  that
follow are rhetorical and do not need answers.

ANSI  C  X3.159-1989,  2.2.4.2.2  Characteristics  of  Floating  Types
<float.h>,  page 15,  lines 32-35,  'The  characteristics of  floating
types are defined in terms of  a model that describes a representation
of floating-point numbers and values that provide information about an
implementation's floating-point  arithmetic.'   Same section,  page 16
line 6,  'A normalized floating-point number  x ... is defined  by the
following model ...'.

    That model is just normalized numbers and zero (appears to include
    signed  zeros).    It  excludes  denormal  and  unnormal  numbers,
    infinities, and NaNs.  Are signed zeros required, or just allowed?

ANSI C X3.159-1989, 3.1.3.1 Floating  Constants, page 27, lines 32-35,
'If the scaled value is in the  range of representable values (for its
type) the  result is  either the nearest  representable value,  or the
larger  or smaller  representable value  immediately  adjacent to  the
nearest value, chosen in an implementation-defined manner.'

    ------+-----+------+--------+----...----+-----+---
          A     B y    C x      D           E  z  F
      -DBL_Den 0.0 +DBL_Den +DBL_MIN ... DBL_MAX INF

    The representable numbers are A,B,C,D,E and F. The number x can be
    converted to  B, C, or  D!  But  what if B  is zero, C  is DBL_DeN
    (denormal), and  D is DBL_MIN  (normalized).  Is  x representable?
    It is  not in the range  DBL_MIN...DBL_MAX and its  inverse causes
    overflow; so those say not valid.  On the other hand, it is in the
    range DBL_DeN...DBL_MAX and it does  not cause underflow; so those
    say it is valid.

    What if  B is zero,  A is -DBL_DeN  (denormal), and C  is +DBL_DeN
    (denormal).   Is y  representable?   If so,  its nearest  value is
    zero, and the immediately adjacent values include a positive and a
    negative number.  So a user written  positive is allowed to end up
    with a negative value!

    What if  E is DBL_MAX  and F is infinity  (on a machine  that uses
    infinities, IEEE-754)?   Does z have a representation?   If z came
    from 1.0/x,  then z caused  overflow which  says invalid.   But on
    IEEE-754  machines,  it  would  either   be  DBL_MAX  or  infinity
    depending upon  the rounding control,  so it has  a representation
    and is valid.

    What is nearest?   In linear or logarithmic sense?   If the number
    is between  0 and DBL_DeN, 1e-9999999  is linear nearest  to zero,
    but log nearest to DBL_DeN.  If number is between DBL_MAX and INF,
    1e+999999 is linear and log nearest  to DBL_MAX.  Or is everything
    bigger than DBL_MAX nearest to INF?

ANSI C X3.159-1989,  3.2.1.3 Floating and Integral,  page 36, footnote
29, 'Thus, the range of portable floating values is (-1,Utype_MAX+1).'

ANSI  C X3.159-1989,  3.2.1.4 Floating  Types, page  36, lines  11-15,
'When a  double is  demoted to  float or  a long  double to  double or
float, if  the value being  converted is  outside the range  of values
that can  be represented,  the behavior  is undefined.   If  the value
being converted is in the range of  values that can be represented but
cannot be represented exactly, the result is either the nearest higher
or nearest lower value, chosen in an implementation-defined manner.'

ANSI C  X3.159-1989, 3.3  Expressions, page  39, lines  15-17, 'If  an
exception occurs during  the evaluation of an expression  (that is, if
the  result is  not  mathematically defined  or not  in  the range  of
representable values for its type), the behavior is undefined.'

    w = 1.0 / 0.0 ;  /* infinity in IEEE-754 */
    x = 0.0 / 0.0 ;  /* NaN in IEEE-754 */
    y = +0.0 ;    /* plus zero */
    z = - y ;     /* minus zero: Must this be -0.0? May it be +0.0? */

    Are the above representable?

ANSI C  X3.159-1989, 4.5.1  Treatment of  Error Conditions,  page 112,
lines 11-12, 'The  behavior of each of these functions  is defined for
all representable values of its input arguments.'

    What  about  non-numbers?    Are  they  representable?    What  is
    sin(NaN);?  If you got a NaN as  input, then you can return NaN as
    output.  But,  is it a domain error?   Must errno be  set to EDOM?
    The NaN already indicates an error,  so setting errno adds no more
    information.   Assuming NaN is not  part of ANSI  C representable,
    but the hardware  supports it, then using NaNs is  an extension of
    ANSI C  and setting errno  need not  be required, but  is allowed.
    Correct?

ANSI C  X3.159-1989, 4.5.1  Treatment of  Error Conditions,  page 112,
lines 20-27,  'Similarly, a range  error occurs  if the result  of the
function  cannot be  represented as  a double  value.   If the  result
overflows (the magnitude of  the result is so large that  it cannot be
represented in an object of the  specified type), the function returns
the value of  the macro HUGE_VAL, with  the same sign (except  for the
tan function) as the  correct value of the function; the  value of the
macro  ERANGE is  stored  in errno.   If  the  result underflows  (the
magnitude of the result  is so small that it cannot  be represented in
an object of  the specified type), the function  returns zero; whether
the integer expression errno acquires the value of the macro ERANGE is
implementation-defined.'

    What about denormal numbers?  What  is sin(DBL_MIN / 3.0L);?  Must
    this be considered underflow and  therefore return zero, and maybe
    set errno to ERANGE?   Or may it return DBL_MIN  / 3.0, a denormal
    number?  Assuming denormals are not  part of ANSI C representable,
    but the hardware  supports it, then using them is  an extension of
    ANSI C  and setting errno  need not  be required, but  is allowed.
    Correct?

    What about  infinity?  What  is exp(INF);?  If  you got an  INF as
    input, then  you can  return INF as  output.  But,  is it  a range
    error?  The output value is  representable, so that says no error.
    The output value is bigger than DBL_MAX, so that says an error and
    set errno  to ERANGE.   Assuming infinity  is not  part of  ANSI C
    representable, but the hardware supports it, then using INFs is an
    extension of ANSI C and setting errno need not be required, but is
    allowed.  Correct?

    What about  signed zeros?  What  is sin(-0.0);?  Must  this return
    -0.0?   May it  return -0.0?   May it return  +0.0?   Signed zeros
    appear to be required in the model on page 16.

    What  is  sqrt(-0.0);?    IEEE-754  and  IEEE-854  (floating-point
    standards) say  this must  be -0.   Is -0.0 negative?   Is  this a
    domain error?

ANSI  C X3.159-1989,  4.9.6.1 The  fprintf Function,  page 133,  lines
32-33, '(It  will begin  with a  sign only  when a  negative value  is
converted if this flag is not specified.)'

    What is fprintf(stdout, "%+.1f", -0.0);?  Must it be -0.0?  May it
    be +0.0?  Is -0.0 a negative value?   The model on page 16 appears
    to require support for signed zeros.

    What is  fprintf(stdout, "%f %f", 1.0/0.0,  0.0/0.0);?  May  it be
    the IEEE-854 strings  of 'inf' or 'infinity' for  the infinity and
    'NaN' the  quiet NaN?   Would 'NaNQ' also  be allowed for  a quiet
    NaN?  Would 'NaNS' be allowed for  a signaling NaN?  Must the sign
    be printed?   Signs are  optional in  IEEE-754 and IEEE-854.   Or,
    must it  be some decimal notation  as specified by page  134, line
    19.  Does the locale matter?

ANSI C X3.159-1989, 4.10.1.4 The strtod Function, page 152, lines 1-2,
'If the subject sequence begins with a minus sign, the value resulting
from the conversion is negated.'

    What is strtod("-0.0", &ptr);?  Must it  be -0.0?  May it be +0.0?
    The model on page 16 appears  to require support for signed zeros.
    All floating-point hardware  I know about support  signed zeros at
    least at the load, store, and negate/complement instruction level.

ANSI  C X3.159-1989,  4.10.1.4 The  strtod Function,  page 152,  lines
12-15, 'If  the correct  value is outside  the range  of representable
values, plus or  minus HUGE_VAL is returned (according to  the sign of
the value), and the value of the macro  ERANGE is stored in errno.  If
the correct  value would  cause underflow,  zero is  returned and  the
value of the macro ERANGE is stored in errno.'

    If HUGE_VAL is +infinity, then is strtod("1e999999",&ptr); outside
    the range of representable values and a range error?  Or is it the
    'nearest' of DBL_MAX and INF?

From:     Fred Tydeman, IBM's NCEG representative
          Mail Stop 35A
          1510 Page Mill Road
          Palo Alto, California 94304
          Internet: tydemanaibmpa.awdpa.ibm.com
          UUCP: uunet!ibmsupt!tydeman
          (415) 855-4430