[Cfp-interest 2247] Re: [SC22WG14.20768] Overflow, normalized numbers, N2805 and N2806

Jim Thomas jaswthomas at sbcglobal.net
Sat Oct 30 11:18:50 PDT 2021



> On Oct 29, 2021, at 9:58 AM, Joseph Myers <joseph at codesourcery.com> wrote:
> 
> On Thu, 28 Oct 2021, Jim Thomas wrote:
> 
>> The intention of N2806 was not to change the definition of normalized, 
>> but to simplify it and eliminate the typo. The new wording of the 
>> definition of normalized floating-point numbers “all x with f1 > 0” is 
>> in the context of the floating-point model described in the preceding 
>> two paragraphs where x is given by the summation in paragraph 3 right 
>> after “p, emin, and emax are fixed constants” in paragraph 2. For a 
>> given type, the implementation defines fixed p, emin, and emax and 
>> provides all the normalized floating-point numbers (represented by the 
>> summation), along with signed or unsigned zero, in the type. The type 
>> may include other numbers, including ones larger than the maximum 
>> normalized floating-point number. The normalized floating-point numbers 
>> give a common (parameterized) range of numbers for C portability.
> 
> That's not how I read the wording, so maybe N2806 needs more work (in 
> which case N2805 and N2843, and maybe N2842, should be deferred if we 
> don't have a fixed version of N2806 in time).  I read it as saying that 
> f_1 > 0 makes a number normalized (whether or not all values of the f_i 
> for that exponent result in a value representable in the type).

I agree that f1 > 0 just defines normalized. It’s the preceding words “Floating types shall be able to represent signed zeros or an unsigned zero (all fk == 0) and all normalized floating-point numbers …” that state the requirement. Isn’t that clear? If not we can work on a rewording.

> 
> I think a reasonable principle would be: there is a range of positive 
> values, and a corresponding range of negative values, such that the 
> precision within that range is at least p base-b digits.  This range does 
> not need to have endpoints that are precisely powers of b, and in the 
> double-double case it does not.  Numbers within that range should not be 
> considered to overflow or underflow and should be considered as normal by 
> isnormal.  (LDBL_NORM_MAX is a new macro in C23, it probably doesn't 
> matter so much how it relates to that range.)

This does seem like a reasonable principle. I think it would need a separate proposal. If WG14 wants this, 5.2.4.2.2#4 might be changed to something like

[4] A normalized floating-point number is a number defined by the model with f1 > 0. Floating types shall be able to represent signed zeros or an unsigned zero (all fk == 0)  and all normalized floating-point numbers whose magnitude is at least an implementation-defined minimum normalized floating-point number with exponent emin and at most an implementation-defined maximum normalized floating-point number with exponent emax. In addition, …

Joseph, is this what you have in mind?

This change would loosen the current specification. In particular it would remove the requirement for types to represent all normalized model numbers (for e = emax or emin). A concern would be how this would affect other places in the standard, for example, the characteristic macros in 5.2.4.2.2 which might depend on all in-range precision p numbers being representable in the type. Examples … Does the formula for LDBL_DIG work if long double is double-double? Or the formula for LDBL_MAX_10_EXP? FLT_MIN_EXP, etc. are defined with reference to the normalized floating-point number emin, but with the suggested principle emin need not be representable. Similarly the formulas for FLT_NORM_MAX, etc. give numbers that need not be representable.

I think N2805 and N2843 would be ok as-is with this change. N2842 would need a change to remove the formulas, which would be straightforward.

- Jim Thomas


> 
> For double-double in the form where the sum of the two halves, rounded to 
> double to nearest, is required to equal the high half, that means no 
> representable finite value is considered to overflow and all such values 
> that aren't subnormal are considered as normal by isnormal - despite that 
> fact that the upper end of the finite range has bits (53 1s, 0, 53 1s) 
> rather than a single contiguous block of 1s as suggested by the generic 
> floating-point model.
> 
> For double-double in the form where (DBL_MAX, DBL_MAX) is a valid 
> representation, values 2^1024 and above have lower precision than in the 
> normal range.  For that variant of double-double, it would seem reasonable 
> to consider such numbers as not normal (maybe an implementation-defined 
> "supernormal" or similar).  Likewise for other formats involving gradual 
> overflow for large exponents (posit, for example): once the precision gets 
> below p, it seems reasonable to consider the numbers as being supernormal 
> (and overflowing) rather than normal.
> 
> Note that for such a format, there is some room for the implementation to 
> make a choice of what it considers the value of p to be; smaller values of 
> p mean a larger range is considered normal for the same format.
> 
> -- 
> Joseph S. Myers
> joseph at codesourcery.com <mailto:joseph at codesourcery.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20211030/32f1885a/attachment-0001.htm>


More information about the Cfp-interest mailing list