[Cfp-interest 2247] Re: [SC22WG14.20768] Overflow, normalized numbers, N2805 and N2806
Jim Thomas
jaswthomas at sbcglobal.net
Sat Oct 30 11:18:50 PDT 2021
> On Oct 29, 2021, at 9:58 AM, Joseph Myers <joseph at codesourcery.com> wrote:
>
> On Thu, 28 Oct 2021, Jim Thomas wrote:
>
>> The intention of N2806 was not to change the definition of normalized,
>> but to simplify it and eliminate the typo. The new wording of the
>> definition of normalized floating-point numbers “all x with f1 > 0” is
>> in the context of the floating-point model described in the preceding
>> two paragraphs where x is given by the summation in paragraph 3 right
>> after “p, emin, and emax are fixed constants” in paragraph 2. For a
>> given type, the implementation defines fixed p, emin, and emax and
>> provides all the normalized floating-point numbers (represented by the
>> summation), along with signed or unsigned zero, in the type. The type
>> may include other numbers, including ones larger than the maximum
>> normalized floating-point number. The normalized floating-point numbers
>> give a common (parameterized) range of numbers for C portability.
>
> That's not how I read the wording, so maybe N2806 needs more work (in
> which case N2805 and N2843, and maybe N2842, should be deferred if we
> don't have a fixed version of N2806 in time). I read it as saying that
> f_1 > 0 makes a number normalized (whether or not all values of the f_i
> for that exponent result in a value representable in the type).
I agree that f1 > 0 just defines normalized. It’s the preceding words “Floating types shall be able to represent signed zeros or an unsigned zero (all fk == 0) and all normalized floating-point numbers …” that state the requirement. Isn’t that clear? If not we can work on a rewording.
>
> I think a reasonable principle would be: there is a range of positive
> values, and a corresponding range of negative values, such that the
> precision within that range is at least p base-b digits. This range does
> not need to have endpoints that are precisely powers of b, and in the
> double-double case it does not. Numbers within that range should not be
> considered to overflow or underflow and should be considered as normal by
> isnormal. (LDBL_NORM_MAX is a new macro in C23, it probably doesn't
> matter so much how it relates to that range.)
This does seem like a reasonable principle. I think it would need a separate proposal. If WG14 wants this, 5.2.4.2.2#4 might be changed to something like
[4] A normalized floating-point number is a number defined by the model with f1 > 0. Floating types shall be able to represent signed zeros or an unsigned zero (all fk == 0) and all normalized floating-point numbers whose magnitude is at least an implementation-defined minimum normalized floating-point number with exponent emin and at most an implementation-defined maximum normalized floating-point number with exponent emax. In addition, …
Joseph, is this what you have in mind?
This change would loosen the current specification. In particular it would remove the requirement for types to represent all normalized model numbers (for e = emax or emin). A concern would be how this would affect other places in the standard, for example, the characteristic macros in 5.2.4.2.2 which might depend on all in-range precision p numbers being representable in the type. Examples … Does the formula for LDBL_DIG work if long double is double-double? Or the formula for LDBL_MAX_10_EXP? FLT_MIN_EXP, etc. are defined with reference to the normalized floating-point number emin, but with the suggested principle emin need not be representable. Similarly the formulas for FLT_NORM_MAX, etc. give numbers that need not be representable.
I think N2805 and N2843 would be ok as-is with this change. N2842 would need a change to remove the formulas, which would be straightforward.
- Jim Thomas
>
> For double-double in the form where the sum of the two halves, rounded to
> double to nearest, is required to equal the high half, that means no
> representable finite value is considered to overflow and all such values
> that aren't subnormal are considered as normal by isnormal - despite that
> fact that the upper end of the finite range has bits (53 1s, 0, 53 1s)
> rather than a single contiguous block of 1s as suggested by the generic
> floating-point model.
>
> For double-double in the form where (DBL_MAX, DBL_MAX) is a valid
> representation, values 2^1024 and above have lower precision than in the
> normal range. For that variant of double-double, it would seem reasonable
> to consider such numbers as not normal (maybe an implementation-defined
> "supernormal" or similar). Likewise for other formats involving gradual
> overflow for large exponents (posit, for example): once the precision gets
> below p, it seems reasonable to consider the numbers as being supernormal
> (and overflowing) rather than normal.
>
> Note that for such a format, there is some room for the implementation to
> make a choice of what it considers the value of p to be; smaller values of
> p mean a larger range is considered normal for the same format.
>
> --
> Joseph S. Myers
> joseph at codesourcery.com <mailto:joseph at codesourcery.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20211030/32f1885a/attachment-0001.htm>
More information about the Cfp-interest
mailing list