[Cfp-interest 1368] Re: (SC22WG14.16914) N2380: printf of NaN()

Mon Jul 29 17:20:06 PDT 2019

On Mon, Jul 29, 2019 at 5:10 PM Jim Thomas <jaswthomas at sbcglobal.net> wrote:
>
>
>
> On Jul 29, 2019, at 1:26 PM, JF Bastien <cxx at jfbastien.com> wrote:
>
> Hi Jim,
>
> Maybe rephrasing things will help get us on the same page.
>
> I agree that printf's lack of specificity about how many characters a
> NaN corresponds to is a problem that should be solved.
>
>
> Hi JF,
>
> Thanks for the clear statement below. But right here is where we disagree. The current specification gives implementations the flexibility to print something appropriate to the way they interpret payloads. For example, an implementation might use the payload to encode how the NaN was created (invalid square root, invalid add, invalid division, etc.) and then reflect that information in the n-char-sequence (INV-SQRT, INV-ADD, INV-DIV, etc., or perhaps as a documented code 1, 2, 3, etc.). Or the implementation might use the payload to encode information about where the NaN was created and use the n-char-sequence to indicate a location. Or the implementation might attach no meaning to payloads. The payload representations used for the getpayload and setpayload functions wouldn’t be suitable for indicating such implementation-specific meaning.

Is this flexibility useful for any existing implementations? i.e.
Implementations that will support the next C standard?

Note that I'm not saying that payloads aren't useful—I've used them
quite a bit in virtual machines—I'm merely asking if C implementations
need this flexibility.

> A macro like _PRINTF_MAX_NAN_LEN seems like a reasonable way to bound the length of the output. For reproducibility, it seems desirable to have a way for the user to select “nan” instead of “nan(n-char-sequence)” style output. I think any more specificity would be too inflexible.

I'm not sure I understand. _PRINTF_MAX_NAN_LEN is the implementation
telling users up to how much they'll print. It doesn't let users
specify "nan" instead of "nan(n-char-sequence)".

Why would anything more specific be inflexible? Again I'd like to
understand which relevant C implementations support this opinion. Are
there implementations which print anything else than a number as the
n-char-sequence? Are these printed as floating-point (as seems logical
given `{get,set}payload`), or integers?

> - Jim Thomas
>
>  I don't think
> _PRINTF_MAX_NAN_LEN is the right way to solve it.
>
> My thoughts are:
>
> 1. We have `getpayload` and `setpayload`, and we know how to print
> numbers, we should therefore be able to specify how payloads are
> printed as numbers. This obviates the need for _PRINTF_MAX_NAN_LEN.
> 2. If we do this, we should also specify things so that they can
> round-trip. i.e. any NaN with payload that I printf should also scanf
> properly, giving me the same NaN with payload. Of course not all
> numbers can be put into a payload.
> 3. We might instead decide that printing payloads is silly, and so is
> printing a NaN's sign. Let's then instead specify that all NaNs print
> as "nan", and scanning anything but "nan" won't work. This also
> obviates the need for _PRINTF_MAX_NAN_LEN.
>
> I hope this expresses my point of view more clearly.
>
> Thanks,
>
> JF
>
>
> On Sun, Jul 28, 2019 at 11:43 AM Jim Thomas <jaswthomas at sbcglobal.net> wrote:
>
>
> \On May 15, 2019, at 8:28 AM, Martin Sebor <msebor at gmail.com> wrote:
>
> On 5/15/19 3:37 AM, Jens Gustedt wrote:
>
> Hello JF,
> On Tue, 14 May 2019 12:33:24 -0700 JF Bastien <cxx at jfbastien.com> wrote:
>
> Specifically, I think C should instead support:
>
> 1. Extraction of NaN integer payload from double / float / long
> double. 2. Creation of NaN with integer payload (without going
> through character sequences).
>
> C in its current flux has this already in Annex F, `getpayload` and
> `setpayload`. Would you want them to be mandatory?
>
> The specification of NaN-related *n-char-sequence* should then be
> constrained to match the restrictions imposed on integer payloads
> (i.e. must be positive, maximum value).
>
> I'd then like to understand what encodings must be supported: does the
> integral encoding support decimal only, or does it support
> hexadecimal (and must it be preceded by 0x)? I think this
> determination should be made by surveying existing implementations.
>
> Then, and only then, does it make sense to figure out the maximum
> number of characters of NaN-related *n-char-sequence* as proposed by
> N2380.
>
> I can't follow you here. This macro is intended to provide the
> knowledge to the user how large a buffer should be if they are
> expecting that a NaN could be printed. This makes sense to me
> regardless what the encodings could be. The user here is just at the
> receiving end and tries to deal with buffer overflows.
>
>
> IEC 60559 does not attach any semantics to NaN payloads, beyond recommending propagation rules. It notes that implementations might encode diagnostic information in payloads. There have been various ideas about how to do that, and some of the ideas have been implemented. There has been no use of payload inspiring enough to suggest that further standard specification would be more valuable than implementations’ flexibility to have their own meaning, if any, for payloads. Thus, payloads are intentionally minimally specified, and hence non-portable. The printf specification for NaNs is intended to serve both implementations that do and ones that do not attach meaning to payloads.
>
> FWIW, the macro is just a band aid on a small subset of
> the problem described in N2301:
>
> 1) there is no requirement/guarantee that the printf output
> is the same even for the same representation of a NaN
>
>
> Implementations should be free to print whatever seems appropriate for any meaning they give payloads.
>
> Or is the “problem" referring to behavior on one implementation? For implementations that do not conform to Annex F, is there a requirement that printf output be the same for the same representation of a number? For Annex F, such a requirement (for consistency on a given implementation), seems ok, though of uncertain value.
>
>
> 2) there is no guarantee that what printf outputs for a NaN
> can be parsed by scanf to get the same NaN back
>
>
> What does this mean? There’s no 1-1 correspondence between NaNs and their I/O string representations.
>
> Implementations that attach no meaning to NaNs should print (-)nan or maybe (-)nan(), but they need to scan printf output from other systems.
>
>
> 3) there is no limit on the amount of output for a NaN
>
>
> Right. This is an unnecessary risk.
>
>
> 4) there is no way for printf callers to choose which of
> the two forms of NaN to format, so no portability
>
>
> Payloads aren’t portable. It would help portability if the user could ask for the (-)nan form. With the (-)nan(n-char-sequence) form, there can be no expectation of implementation-independent printf output. A user option to get the nan(n-char-sequence) form can’t determine the n-char-sequence, so it would only serve to get nan() instead of nan.
>
> - CFP group
>
>
> When we discussed N2301 there was 12/0/1 consensus to address
> "the issue" in the paper.  All of the above is "the issue."
>
> The group then started bike-shedding how "the issue" should
> be addressed and someone had the bright idea that precision
> would be a better way to do it then the pound AKA hash flag
> in the proposal.  A straw poll of that idea was 7/4/3, which
> by our arbitrary standards of interpretation was viewed as
> direction to proceed.  But precision isn't a viable mechanism
> for selecting between the NaN formats because it would screw
> up the formatting of finite numbers.  So with that, "the issue"
> has morphed into just (3) above which is the subset N2380 tries
> to solve.  In practice (3) isn't a real problem because no sane
> implementation would produce more output than the number of bits
> in a NaN, so a portable program can conservatively allocate at
> least that much space for each number and be assured it won't
> overflow.  Unlike (3), though, all the others aspects of
> the issue are real or at least far more likely.  N2301 solved
> all of them, including (3).  But in our wisdom, we choose to
> solve the part that doesn't affect anyone and call it good.
>
> Martin
>
>
>
>