[Cfp-interest 1386] mixing standard and interchange floating types

Jim Thomas jaswthomas at sbcglobal.net
Fri Aug 23 10:57:16 PDT 2019


From CFP August meeting notes:
>     Example for FLT_EVAL_METHOD
> http://wiki.edg.com/pub/CFP/WebHome/Example_for_FLT_EVAL_METHOD-20190815.pdf <http://wiki.edg.com/pub/CFP/WebHome/Example_for_FLT_EVAL_METHOD-20190815.pdf>
>       Rajan: Not worth change in practice.
>       Is there a case where the blanket prohibition of mixing standard types and interchange types where this matters?
> 


With our current spec for the tgmath macros for functions that round to narrower type, if the macro prefix in fM or fMx, use of an argument of standard or decimal type floating type results in undefined behavior. With the current rules for determining _t types and FLT_EVAL_METHOD equal 1, _Float64_t would be double and the following would have undefined behavior:

_Float32 z;
_Float64_t x, y;
z = f32add(x, y);

The change I suggested for the evaluation methods spec (http://wiki.edg.com/pub/CFP/WebHome/Example_for_FLT_EVAL_METHOD-20190815.pdf <http://wiki.edg.com/pub/CFP/WebHome/Example_for_FLT_EVAL_METHOD-20190815.pdf>)  fixes this case and some others, but other problematic cases would remain. For example, if FLT_EVAL_METHOD equals 0, 1, or 2, then _Float16_t would be a standard floating type (float, double, or long double) and the following would have undefined behavior:

_Float32 z;
_Float64_t x;
_Float16_t y
z = f32add(x, y);

I think the current rules for the tgmath macros for functions that round to narrower type are too restrictive. I think users would be surprised to see the examples above fail to produce the reasonably expected results. The following is intended to loosen the rules to accommodate such cases.

Change:
All arguments are generic. If any argument is not real, use of the macro results in undefined behavior. If the macro prefix is f or d, use of an argument of interchange or extended floating type results in undefined behavior. If the macro prefix is fM, or fMx, use of an argument of standard or decimal floating type results in undefined behavior. If the macro prefix is dM or dMx, use of an argument of standard or binary floating type results in undefined behavior. The function invoked is determined as follows:

to:
All arguments are generic. If any argument is not real, use of the macro results in undefined behavior. If the macro prefix is f or d, any argument of binary floating type whose values are a subset of (or the same as) the values of float, else double, else long double, are regarded as having that standard floating type; otherwise use of an argument of binary or decimal floating type results in undefined behavior. If the macro prefix is fM, or fMx, any argument of standard floating type whose values are a subset of (or the same as) the values of a supported type _FloatN, else _FloatNx, is regarded as having such interchange floating type with minimum N, else such extended floating type with minimum N*); otherwise, use of an argument of standard or decimal floating type results in undefined behavior. If the macro prefix is dM, or dMx, use of an argument of standard or binary floating type results in undefined behavior. 

*) There will be such a binary floating type in all cases except possibly when long double does not have an IEC 60559 format.

In EXAMPLE 1, change:

ddiv(ld, f128) 		undefined
f32fma(f64, d, f64) 	undefined

to:

ddiv(ld, f128) 		ddivl(ld, f128) if _Float128 values are a subset of long double values, or undefined otherwise
f32fma(f64, d, f64) 	f32fmaf64(f64, d, f64)

Might want to add a few more examples.


- Jim Thomas




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20190823/2ffe1d1f/attachment-0001.html 


More information about the Cfp-interest mailing list