[Cfp-interest 1768] Re: (SC22WG14.18328) Comments on N2561 (TS 18661-3 as Annex update)

Jim Thomas jaswthomas at sbcglobal.net
Tue Sep 15 11:37:57 PDT 2020


Below are thoughts on Joseph Meyers’s comments, and on a related problem. Let’s plan to resolve these issues at the September CFP meeting, prior to presenting the annex proposal to WG14 in October.

> On Sep 10, 2020, at 4:30 PM, Joseph Myers <joseph at codesourcery.com> wrote:
> 
> Some comments on N2561 (TS 18661-3 as Annex update - note that 
> http://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm has the 
> descriptions for N2558 and N2561 swapped), where it deviates from the 
> original TS and subsequent CR resolutions:
> 
> * This version introduces changes to default argument promotions for 
> _Float16, _Float32 and _Float64.  This is a bad idea.  To quote again the 
> response to DR#206: "real float promotion to double is in Standard C 
> purely for compatibility with K&R. Since complex is new, that 
> compatibility is not an issue, and having it behave like real float would 
> introduce undesired overhead".  Exactly the same reasoning as for _Complex 
> float applies for these new types: they didn't exist in K&R C, so there is 
> no need for promotion when passed in variable arguments, and it's more 
> efficient not to promote them (as well as allowing for the possibility of 
> a signaling NaN being passed as-is by a copy operation, which cannot 
> happen when promoted).

WG14 seems to regard the default promotion of float to double as a historical artifact and not a feature to be extended.

Passing arguments by copy operation is not guaranteed on all implementations, but use of signaling NaNs is not portable anyway. So Joseph’s point about that seems valid.  

We added the new default promotions primarily to synchronize with David Olsen's proposal for extension floating-point types in C++. I understand that David does not regard this as a critical feature and is willing to remove the promotions from his proposal.

A secondary rationale was to allow using the narrower types with printf. This might be a convenience, though only for those narrower types and not with scanf. We already have conversion of strings to and from all new types and formats, and of course the strings can be used with printf and scanf.

This is not a significant IEC 60559 support issue. 

> 
> * This version has further changes to the rules for choosing a function in 
> <tgmath.h> for one of the narrowing macros, beyond what appears in the 
> final version of the response to FPE CR#13, and those changes have 
> counterintuitive consequences.
> 
> My understanding is that the changes to allow standard types to be passed 
> to a type-generic macro whose name indicates a return type of _FloatN or 
> _FloatNx, and to allow binary types to be passed when the return type is 
> float or double, is to avoid problems when arguments are of types such as 
> _Float32_t, and that's fine.  

The change also avoids unnecessary undefined behavior for integer-type arguments in some cases, e.g., f16sqrt(2) was undefined (tgmath regards the integer-type argument as double), now invokes f16sqrtf64.

> However, the new rules gives can result in a 
> function being chosen with a different argument type from the function 
> arguments, even if there is a function that matches exactly, that the 
> seems counterintuitive.
> 
> For example, say the arguments to f64add are of type _Float64x (or one is 
> _Float64x and one is of integer type, to get a case when the choice of 
> function can actually affect the return value), and all _Float64x values 
> can also be represented as _Float128.  The rules in N2561 would result in 
> f64addf128 being called.  The rules from CR#13 would result in f64addf64x, 
> a function with exactly matching argument type, being called, which seems 
> more intuitive.

I agree.

If both arguments are _Float64x, and _Float64x <= _Float128, then f64addf64x and f64addf128 yield identical results. Either could be called by the as-if rule. 

But, as Joseph notes, if one arg is _Float64x and the other is integer type, f64addf128 converts the integer-type argument to _Float128 which if wider than _Float64x could produce a different (though more accurate) result than f64addf64x.
 
> 
> Adding a rule that if a corresponding function exists with exactly the 
> type determined from the macro arguments, then that function is called, 
> before looking for a big-enough type from the given lists, would avoid 
> that unintuitive result

Right. Here’s a draft for the change:

https://wiki.edg.com/pub/CFP/WebHome/tgmath_update_1.pdf <https://wiki.edg.com/pub/CFP/WebHome/tgmath_update_1.pdf>
> (and also make the example f32add(f64x, f64) given 
> as yielding a call to f32addf64x valid again, with the current wording it 
> looks incorrect to me if _Float128 is supported and represents all the 
> values of _Float64x).

Yes, the example is incorrect as noted (with the N2561 rules).

* Looking at this uncovered another problem with integer type arguments. Consider

	#include <tgmath.h>
	_Float32x f32x;
	int n;
	… pow(f32x, n) …

This has undefined behavior if _Float32x and double are not ordered (by inclusion of values).

Background …

Usual arithmetic conversions convert an integer operand to the floating type of the other operand.

tgmath in C17 regards integer arguments as double.

tgmath in draft C2X regards integer arguments as _Decimal64 if any arguments are decimal and as double otherwise, then refers to the usual arithmetic conversions to determine the generic parameter type. So the usual arithmetic conversion rule converting an integer operand to the type of a floating operand does not apply.

For C17 and the current draft C2X, real types of a given radix are ordered. So the usual arithmetic conversions can always pick a widest real type.

With the TS3 annex …

There are two cases where we have intentionally allowed types with the same radix to be unordered: a non-IEC 60559 long double type and a _FloatN type can be unordered if N > 64; and, extended types can be unordered with other types (and each other). IEC 60559 allows such unordering with extended formats. 

So there is potential for unordered floating types in certain cases — user beware. However, by its treatment of integer-type arguments the tgmath rules can introduce unordering where the user is not explicitly using potentially unordered floating types.

The annex could address this problem with the following two changes: 

Insert after X.13 #2 the paragraph:

[] The treatment of arguments of integer type in 7.25 is expanded to cases where another argument has extended type. Arguments of integer type are regarded as having type:

_Decimal64x	if any argument has a decimal extended type; otherwise
_Float32x	if any argument has a binary extended type; otherwise
_Decimal64	if any argument has decimal type; otherwise
double


Insert after the first sentence of X.2.3 #2:

The set of values of _Float32x is a subset of the set of values of _Float64x; the set of values of _Float64x is a subset of the set of values of _Float128x. The set of values of _Decimal64x is a subset of the set of values of _Decimal128x.


Here is a draft update to the example in X.13, given the changes above:

https://wiki.edg.com/pub/CFP/WebHome/tgmath_update_2.pdf <https://wiki.edg.com/pub/CFP/WebHome/tgmath_update_2.pdf>

The incorrect example Joseph mentioned becomes correct with the rule change above.

In the examples, I replaced fmax with pow, because we’re encouraging use of the new min-max functions.

- Jim Thomas

> 
> -- 
> Joseph S. Myers
> joseph at codesourcery.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20200915/5b28a1db/attachment.htm>


More information about the Cfp-interest mailing list