[Cfp-interest 2803] major white-space issue for functions like strtod

Thu Jul 20 03:33:38 PDT 2023

While working on the GNU MPFR specification and tests, I've just
noticed an issue in the ISO C standard, still present in N3149.

About 7.24.1.5 "The strtod, strtof, and strtold functions":

In 7.24.1.5p2:

  The strtod, strtof, and strtold functions convert the initial
  portion of the string pointed to by nptr to double, float, and
  long double representation, respectively. First, they decompose
  the input string into three parts: an initial, possibly empty,
  sequence of white-space characters, a subject sequence resembling
  a floating constant or representing an infinity or NaN; and a
  final string of one or more unrecognized characters, including
  the terminating null character of the input string. Then, they
  attempt to convert the subject sequence to a floating-point number,
  and return the result.

and 7.24.1.5p3 also says:

  The subject sequence is defined as the longest initial subsequence
  of the input string, starting with the first non-white-space
  character, that is of the expected form.

For the meaning of "white-space", 7.1.1p5 says:

  In this clause, "white-space character" refers to (execution)
  white-space character as defined by isspace.

Now, isspace is specified in 7.4.1.10:

  The isspace function tests for any character that is a standard
  white-space character or is one of a locale-specific set of
  characters for which isalnum is false. The standard white-space
  characters are the following: space (’ ’), form feed (’\f’),
  new-line (’\n’), carriage return (’\r’), horizontal tab (’\t’),
  and vertical tab (’\v’). In the "C" locale, isspace returns true
  only for the standard white-space characters.

So in a non-"C" locale, isspace is allowed to be true for any
character for which isalnum is false.

7.4.1.1 (about isalnum) says:

  The isalnum function tests for any character for which
  isalpha or isdigit is true.

7.4.1.2 (about isalpha) says:

  The isalpha function tests for any character for which isupper or
  islower is true, or any character that is one of a locale-specific
  set of alphabetic characters for which none of iscntrl, isdigit,
  ispunct, or isspace is true. [...]

7.4.1.5 (about isdigit) says:

  The isdigit function tests for any decimal-digit character
  (as defined in 5.2.1).

So, basically, characters like '-' and '.', which may be needed
to represent floating-point numbers as character strings, might
be regarded as white-space characters in some locales (because
isalnum is typically false). I suppose that this is not the case
in general, but this is currently allowed by the standard, and
strings like "-.123" might not be parsed as wanted, because the
subject sequence could be "123".

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)