[Cfp-interest 2803] major white-space issue for functions like strtod
Vincent Lefevre
vincent at vinc17.net
Thu Jul 20 03:33:38 PDT 2023
While working on the GNU MPFR specification and tests, I've just
noticed an issue in the ISO C standard, still present in N3149.
About 7.24.1.5 "The strtod, strtof, and strtold functions":
In 7.24.1.5p2:
The strtod, strtof, and strtold functions convert the initial
portion of the string pointed to by nptr to double, float, and
long double representation, respectively. First, they decompose
the input string into three parts: an initial, possibly empty,
sequence of white-space characters, a subject sequence resembling
a floating constant or representing an infinity or NaN; and a
final string of one or more unrecognized characters, including
the terminating null character of the input string. Then, they
attempt to convert the subject sequence to a floating-point number,
and return the result.
and 7.24.1.5p3 also says:
The subject sequence is defined as the longest initial subsequence
of the input string, starting with the first non-white-space
character, that is of the expected form.
For the meaning of "white-space", 7.1.1p5 says:
In this clause, "white-space character" refers to (execution)
white-space character as defined by isspace.
Now, isspace is specified in 7.4.1.10:
The isspace function tests for any character that is a standard
white-space character or is one of a locale-specific set of
characters for which isalnum is false. The standard white-space
characters are the following: space (’ ’), form feed (’\f’),
new-line (’\n’), carriage return (’\r’), horizontal tab (’\t’),
and vertical tab (’\v’). In the "C" locale, isspace returns true
only for the standard white-space characters.
So in a non-"C" locale, isspace is allowed to be true for any
character for which isalnum is false.
7.4.1.1 (about isalnum) says:
The isalnum function tests for any character for which
isalpha or isdigit is true.
7.4.1.2 (about isalpha) says:
The isalpha function tests for any character for which isupper or
islower is true, or any character that is one of a locale-specific
set of alphabetic characters for which none of iscntrl, isdigit,
ispunct, or isspace is true. [...]
7.4.1.5 (about isdigit) says:
The isdigit function tests for any decimal-digit character
(as defined in 5.2.1).
So, basically, characters like '-' and '.', which may be needed
to represent floating-point numbers as character strings, might
be regarded as white-space characters in some locales (because
isalnum is typically false). I suppose that this is not the case
in general, but this is currently allowed by the standard, and
strings like "-.123" might not be parsed as wanted, because the
subject sequence could be "123".
--
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
More information about the Cfp-interest
mailing list