Numeric related Request For Interpretations to ANSI C
Fred Tydeman
ibmsupt!ibmpa!tydemanauunet.UU.NET
Fri Jan 4 07:45:54 PST 1991
The following are numeric related ANSI C issues that have been
submitted to ANSI C for official interpretations that may be of
interest to this mailing list.
----------------------------------------------------------------------
Subject: Formal Request For Interpretation Number 2
What is the result of: strtod( "100ergs", &ptr);? Is it 100.0 or is
it 0.0?
ANSI C X3.159-1989, 4.10.1.4 The strtod Function, page 151, lines
36-38: 'The subject sequence is defined as the longest initial
subsequence of the input string, starting with the first
non-white-space character, that is of the expected form.' In this
case, the longest initial subsequence of the expected form is "100",
so 100.0 should be the return value. Also, since the entire string is
in memory, strtod can scan it as many times as need be to find the
longest valid initial subsequence.
ANSI C X3.159-1989, 4.9.6.2 The fscanf Function, page 137, lines
17-18: 'e,f,g Matches an optionally signed floating-point number,
whose format is the same as expected for the subject string of the
strtod function.' Later, page 139, lines 6, 16, and 25 show that
'100ergs' fails to match "%f". Those two show that '100ergs' is
invalid to fscanf and therefore, invalid to strtod. Then, page 152,
lines 11-12 'If no conversion could be performed, zero is returned'
indicates for an error input, 0.0 should be returned. The reason this
is invalid is spelled out in the Rationale, 4.9.6.2 The fscanf
function, page 95, 'One-character pushback is sufficient for the
implementation of fscanf. Given the invalid field "-.x", the
characters "-." are not pushed back.' And later, 'The conversions
performed by fscanf are compatible with those performed by strtod and
strtol.'
So, do strtod and fscanf act alike and both accept and fail on the
same inputs, by the one-character pushback scanning strategy, or do
they use different scanning strategies and strtod accept more than
fscanf?
----------------------------------------------------------------------
Subject: Formal Request For Interpretation Number 3
Assuming that 99999 is larger than DBL_MAX_10_EXP, what is the result
of: strtod( "0.0e99999", &ptr);? Is it 0.0, HUGE_VAL, or undefined?
ANSI C X3.159-1989, 3.1.3.1 Floating Constants, page 27, lines 30-32:
'The significand part is interpreted as a decimal rational number; the
digit sequence in the exponent part is interpreted as a decimal
integer. The exponent indicates the power of 10 by which the
significand part is to be scaled.' In this case 0.0e99999 means 0.0
times 10 to the power 99999, or 0.0 * 10 ** 99999, which has a scaled
value of 0.0; therefore, return 0.0
ANSI C X3.159-1989, 4.10.1.4 The strtod Function, page 152, lines
12-14: 'If the correct value is outside the range of representable
values, plus or minus HUGE_VAL is returned (according to the sign of
the value), and the value of the macro ERANGE is stored in errno'.
Since the exponent (99999 in this case) is larger than DBL_MAX_10_EXP,
the value if outside the range of representable values (overflow).
Therefore, return HUGE_VAL.
ANSI C X3.159-1989, 2.2.4.2.2 Characteristics of Floating Types
<float.h>, pages 15 and 16, describe the model that defines the
floating-point types. The number 0.0e99999, as written, is not part
of that model (it cannot be represented since the exponent is larger
than e-max). From 3.2.1.4 Floating Types, page 36, lines 11-13,
'...if the value being converted is outside the range of values that
can be represented, the behavior is undefined.' Therefore, since this
number, as written, has no representation, the behavior is undefined.
----------------------------------------------------------------------
Subject: Formal Request For Interpretation Number 4
ANSI C X3.159-1989, 4.10.1.4 The strtod Function, page 152, line 5:
What does '"C" locale' mean?
a) setlocale(LC_ALL,NULL) == "C"
b) setlocale(LC_NUMERIC,NULL) == "C"
c) a) && b)
d) a) || b)
e) something else.
What does 'other than the "C" locale' mean?
a) setlocale(LC_ALL,NULL) != "C"
b) setlocale(LC_NUMERIC,NULL) != "C"
c) a) && b)
d) a) || b)
e) something else.
Section 4.4.1 Locale Control, page 108 may help answer the questions.
----------------------------------------------------------------------
Subject: Formal Request For Interpretation Number 5
What is meant by 'representable floating-point value'? Assume double
precision, unless stated otherwise.
First, some definitions based partially upon the floating-point model
on pages 15-16 of ANSI C X3.159-1989:
1) +Normal Numbers: DBL_MIN to DBL_MAX, inclusive; normalized (first
significand digit is non-zero), sign is +1.
2) -Normal Numbers: -DBL_MAX to -DBL_MIN, inclusive; normalized.
3) +Zero: All digits zero, sign is +1; (true zero).
4) -Zero: All digits zero, sign is -1.
5) Zero: Union of +zero and -zero.
6) +Denormals: Exponent is "minimum" (Biased exponent is zero);
first significand digit is zero; sign is +1. These are in range
+DBL_DeN (inclusive) to +DBL_MIN (exclusive). Let DBL_DeN be the
symbol for the minimum denormal, so we can talk about it by name.
7) -Denormals: Same as +denormals, except sign, and range is
-DBL_MIN (exclusive) to -DBL_DeN (inclusive).
8) +Unnormals: Biased exponent is non-zero; first significand digit
is zero; sign is +1. These overlap the range of +normals and
+denormals.
9) -Unnormals: Same as +unnormals, except sign; range is over
-normals and -denormals.
10) +infinity: From IEEE-754.
11) -infinity: From IEEE-754.
12) Quiet NaN (Not a Number); sign does not matter; from IEEE-754.
13) Signaling NaN; sign does not matter; from IEEE-754.
14) NaN: Union of Quiet NaN and Signaling NaN.
15) Others: Reserved (VAX?) and Indefinite (CDC/Cray?) act like NaN.
On the real number line, these symbols order as:
[ 1 )[ 2 ]( 3 ]( 4 )[ 5 ]( 6 )[ 7 )[ 8 ]( 9 ]
+------+--------+--------+-----+---+-----+--------+--------+------+
-INF -DBL_MAX -DBL_MIN -DBL_DeN -0 +0 +DBL_DeN +DBL_MIN +DBL_MAX +INF
Non-real numbers are: SNaN, QNaN, and NaN, call this region 10.
Regions 1 and 9 are overflow, 2 and 8 are normal numbers, 3 and 7 are
denormal numbers (psuedo underflow), 4 and 6 are true underflow, and 5
is zero.
So, the question is: What does 'representable (double precision)
floating-point value' mean:
a) Regions 2, 5 and 8 (+/- normals and zero)
b) Regions 2, 3, 5, 7, and 8 (+/- normals, denormals, and zero)
c) Regions 2 through 8 [-DBL_MAX ... +DBL_MAX]
d) Regions 1 through 9 [-INF ... +INF]
e) Regions 1 through 10 (reals and non-reals)
f) What the hardware can represent
g) Something else? What?
Some things to consider in your answer follow. The questions that
follow are rhetorical and do not need answers.
ANSI C X3.159-1989, 2.2.4.2.2 Characteristics of Floating Types
<float.h>, page 15, lines 32-35, 'The characteristics of floating
types are defined in terms of a model that describes a representation
of floating-point numbers and values that provide information about an
implementation's floating-point arithmetic.' Same section, page 16
line 6, 'A normalized floating-point number x ... is defined by the
following model ...'.
That model is just normalized numbers and zero (appears to include
signed zeros). It excludes denormal and unnormal numbers,
infinities, and NaNs. Are signed zeros required, or just allowed?
ANSI C X3.159-1989, 3.1.3.1 Floating Constants, page 27, lines 32-35,
'If the scaled value is in the range of representable values (for its
type) the result is either the nearest representable value, or the
larger or smaller representable value immediately adjacent to the
nearest value, chosen in an implementation-defined manner.'
------+-----+------+--------+----...----+-----+---
A B y C x D E z F
-DBL_Den 0.0 +DBL_Den +DBL_MIN ... DBL_MAX INF
The representable numbers are A,B,C,D,E and F. The number x can be
converted to B, C, or D! But what if B is zero, C is DBL_DeN
(denormal), and D is DBL_MIN (normalized). Is x representable?
It is not in the range DBL_MIN...DBL_MAX and its inverse causes
overflow; so those say not valid. On the other hand, it is in the
range DBL_DeN...DBL_MAX and it does not cause underflow; so those
say it is valid.
What if B is zero, A is -DBL_DeN (denormal), and C is +DBL_DeN
(denormal). Is y representable? If so, its nearest value is
zero, and the immediately adjacent values include a positive and a
negative number. So a user written positive is allowed to end up
with a negative value!
What if E is DBL_MAX and F is infinity (on a machine that uses
infinities, IEEE-754)? Does z have a representation? If z came
from 1.0/x, then z caused overflow which says invalid. But on
IEEE-754 machines, it would either be DBL_MAX or infinity
depending upon the rounding control, so it has a representation
and is valid.
What is nearest? In linear or logarithmic sense? If the number
is between 0 and DBL_DeN, 1e-9999999 is linear nearest to zero,
but log nearest to DBL_DeN. If number is between DBL_MAX and INF,
1e+999999 is linear and log nearest to DBL_MAX. Or is everything
bigger than DBL_MAX nearest to INF?
ANSI C X3.159-1989, 3.2.1.3 Floating and Integral, page 36, footnote
29, 'Thus, the range of portable floating values is (-1,Utype_MAX+1).'
ANSI C X3.159-1989, 3.2.1.4 Floating Types, page 36, lines 11-15,
'When a double is demoted to float or a long double to double or
float, if the value being converted is outside the range of values
that can be represented, the behavior is undefined. If the value
being converted is in the range of values that can be represented but
cannot be represented exactly, the result is either the nearest higher
or nearest lower value, chosen in an implementation-defined manner.'
ANSI C X3.159-1989, 3.3 Expressions, page 39, lines 15-17, 'If an
exception occurs during the evaluation of an expression (that is, if
the result is not mathematically defined or not in the range of
representable values for its type), the behavior is undefined.'
w = 1.0 / 0.0 ; /* infinity in IEEE-754 */
x = 0.0 / 0.0 ; /* NaN in IEEE-754 */
y = +0.0 ; /* plus zero */
z = - y ; /* minus zero: Must this be -0.0? May it be +0.0? */
Are the above representable?
ANSI C X3.159-1989, 4.5.1 Treatment of Error Conditions, page 112,
lines 11-12, 'The behavior of each of these functions is defined for
all representable values of its input arguments.'
What about non-numbers? Are they representable? What is
sin(NaN);? If you got a NaN as input, then you can return NaN as
output. But, is it a domain error? Must errno be set to EDOM?
The NaN already indicates an error, so setting errno adds no more
information. Assuming NaN is not part of ANSI C representable,
but the hardware supports it, then using NaNs is an extension of
ANSI C and setting errno need not be required, but is allowed.
Correct?
ANSI C X3.159-1989, 4.5.1 Treatment of Error Conditions, page 112,
lines 20-27, 'Similarly, a range error occurs if the result of the
function cannot be represented as a double value. If the result
overflows (the magnitude of the result is so large that it cannot be
represented in an object of the specified type), the function returns
the value of the macro HUGE_VAL, with the same sign (except for the
tan function) as the correct value of the function; the value of the
macro ERANGE is stored in errno. If the result underflows (the
magnitude of the result is so small that it cannot be represented in
an object of the specified type), the function returns zero; whether
the integer expression errno acquires the value of the macro ERANGE is
implementation-defined.'
What about denormal numbers? What is sin(DBL_MIN / 3.0L);? Must
this be considered underflow and therefore return zero, and maybe
set errno to ERANGE? Or may it return DBL_MIN / 3.0, a denormal
number? Assuming denormals are not part of ANSI C representable,
but the hardware supports it, then using them is an extension of
ANSI C and setting errno need not be required, but is allowed.
Correct?
What about infinity? What is exp(INF);? If you got an INF as
input, then you can return INF as output. But, is it a range
error? The output value is representable, so that says no error.
The output value is bigger than DBL_MAX, so that says an error and
set errno to ERANGE. Assuming infinity is not part of ANSI C
representable, but the hardware supports it, then using INFs is an
extension of ANSI C and setting errno need not be required, but is
allowed. Correct?
What about signed zeros? What is sin(-0.0);? Must this return
-0.0? May it return -0.0? May it return +0.0? Signed zeros
appear to be required in the model on page 16.
What is sqrt(-0.0);? IEEE-754 and IEEE-854 (floating-point
standards) say this must be -0. Is -0.0 negative? Is this a
domain error?
ANSI C X3.159-1989, 4.9.6.1 The fprintf Function, page 133, lines
32-33, '(It will begin with a sign only when a negative value is
converted if this flag is not specified.)'
What is fprintf(stdout, "%+.1f", -0.0);? Must it be -0.0? May it
be +0.0? Is -0.0 a negative value? The model on page 16 appears
to require support for signed zeros.
What is fprintf(stdout, "%f %f", 1.0/0.0, 0.0/0.0);? May it be
the IEEE-854 strings of 'inf' or 'infinity' for the infinity and
'NaN' the quiet NaN? Would 'NaNQ' also be allowed for a quiet
NaN? Would 'NaNS' be allowed for a signaling NaN? Must the sign
be printed? Signs are optional in IEEE-754 and IEEE-854. Or,
must it be some decimal notation as specified by page 134, line
19. Does the locale matter?
ANSI C X3.159-1989, 4.10.1.4 The strtod Function, page 152, lines 1-2,
'If the subject sequence begins with a minus sign, the value resulting
from the conversion is negated.'
What is strtod("-0.0", &ptr);? Must it be -0.0? May it be +0.0?
The model on page 16 appears to require support for signed zeros.
All floating-point hardware I know about support signed zeros at
least at the load, store, and negate/complement instruction level.
ANSI C X3.159-1989, 4.10.1.4 The strtod Function, page 152, lines
12-15, 'If the correct value is outside the range of representable
values, plus or minus HUGE_VAL is returned (according to the sign of
the value), and the value of the macro ERANGE is stored in errno. If
the correct value would cause underflow, zero is returned and the
value of the macro ERANGE is stored in errno.'
If HUGE_VAL is +infinity, then is strtod("1e999999",&ptr); outside
the range of representable values and a range error? Or is it the
'nearest' of DBL_MAX and INF?
From: Fred Tydeman, IBM's NCEG representative
Mail Stop 35A
1510 Page Mill Road
Palo Alto, California 94304
Internet: tydemanaibmpa.awdpa.ibm.com
UUCP: uunet!ibmsupt!tydeman
(415) 855-4430
More information about the Numeric-interest
mailing list