[Cfp-interest 2867] Re: CFP review of TS-4 and TS-5 revisions

Sat Aug 26 10:30:31 PDT 2023

Thanks for the careful reading!
I'll work on an improved version.

> One example would do for the TS. Omit the PRODUCT.

Fine with me!   It remains in the background document.

> How much to say about double-double?

It's a familiar topic of contention in the C committee, 
fitting into a framework influenced by 754.
In contrast, most of them have probably never heard of 
reproducible dot products and wouldn't care.
But a mention should go in the background document.

> Are there references for the algorithms used that should be mentioned?

I invented the example for didactic purposes.    There are probably lots
of algorithms optimized for various situations.

> Maybe "which lack hardware support for quadruple precisionâ€. (thinking of Titanium whose architecture allowed for decently fast quad)

Definitely.

> Arenâ€™t the augmented operations useful for double-double with good performance only if implemented in hardware?

Yes, the specifications are tight enough that a software implementation would
produce a double-double implementation uncompetitive in performance.
At the time, the intention was to specify hardware that worked better than
any software, rather than specify software that would be uncompetitive.

 This is subtly suggested below. We could just say, the augmented operations can be used to implement extended precision â€¦,  and not delve into performance, but if we do discuss performance we should be up front about the hardware need.
> 
> > Double-double represents numbers as a pair of doubles, the second much smaller
> > in magnitude than the first.    Ideally, in a pair (a.h,a.t), a.h would
> > contain the correctly rounded result of computed(a.h+a.t), 
> > and a.t would contain the correctly rounded result of computed(a.h+a.t)-a.h.
> > Performance considerations often compromise this ideal.
> 
> This might be a good place to mention that there is no standard for double-double and details differ among implementations.
> 
> > 
> > The augmented operation make it possible to implement + - * for
> > double-double format, in a way that is commutative and avoids conditional
> > branches, yet still yields close to the intended results, and is competitive
> > in performance if supported in hardware.
> > 
> > Let {a.h;a.t} represent a double-double value a intended to contain the value
> > a.h+a.t in infinite precision, and {b.t;b.t} a similar double-double value b.
> > 
> > SUM
> > 
> > Consider computing the sum {z.h;z.t}, a double-double value z
> > intended to contain {a.h+a.t}+{b.h+b.t} as closely as possible.
> > 
> > Clearly z should be a.h + a.t + b.h + b.t, but this is might be
> > a higher precision value that has to be rounded to fit double-double.
> 
> Omit â€œisâ€.
> 
> Maybe just say â€œbut this might have to be rounded to fit double-double.â€ The precision (as used in C) of double-double isnâ€™t all that determines whether a value fits.

I rewrote the algorithm to be clearer, and to fix the typo you noticed:

 Let the algorithm be

                                # infinite result is a.h +a.t +b.h +b.t

        {u.h;u.t} = augadd(a.h, b.h)
                                # infinite result is u.h +u.t +a.t +b.t

        {v.h;v.t} = augadd(a.t, b.t)
                                # infinite result is u.h +u.t +v.h +v.t

        {w.h;w.t} = augadd(u.t, v.t)
                                # could be regular add since w.t is discarded
                                # infinite result is u.h +v.h +w.h +w.t

        {y.h;y.t} = augadd(v.h, w.h)
                                # could be regular add since y.t is discarded
                                # infinite result is u.h +y.h +y.t +w.t

        {z.h;z.t} = augadd(u.h, y.h)
                                # infinite result is z.h +z.t +y.t +w.t

 Now {z.h;z.t} is a pretty good approximation to the intended result,
 commutative and without conditional branches,
 with error y.t +w.t.
 Two of the additions are marked as "could be regular add" rather than
 augadd - the algorithm above gives a name to w.t and y.t for the
 didactic purpose of the error formula.

I claim that it's commutative by symmetry inspection - switch a and b,
then u and v are the same, so w is the same, so  y is the same, so z is the 
same.    Same for product.   I could mention that in the backgrounder.

> > and has no conditional branches.    Its edge cases
better-> > handling of special cases?

> Doesn't the possible inexact underflow of augmul affect this?

The error analysis is only for the normal case.
As with other double-double implementations, with no specification to meet,
special cases fall out however they do.    A user willing to sacrifice
normal case performance for correctness and predictability in special cases 
might as well use software quad instead.