[Cfp-interest 2851] Re: CFP review of TS-4 and TS-5 revisions
David Hough CFP
pcfp at oakapple.net
Thu Aug 24 20:29:04 PDT 2023
I regret sending a half-baked text which revealed numerous typos as soon
as I sent it.
The following might be as much as 51%-baked -
EXAMPLE
The augmented operation are useful for extending precision, particularly
for implementing "double-double" arithmetic, which provides a faster, though
less precise and less predictable, alternative to software quadruple precision,
on systems which lack hardware quadruple precision.
Double-double represents numbers as a pair of doubles, the second much smaller
in magnitude than the first. Ideally, in a pair (a.h,a.t), a.h would
contain the correctly rounded result of computed(a.h+a.t),
and a.t would contain the correctly rounded result of computed(a.h+a.t)-a.h.
Performance considerations often compromise this ideal.
The augmented operation make it possible to implement + - * for
double-double format, in a way that is commutative and avoids conditional
branches, yet still yields close to the intended results, and is competitive
in performance if supported in hardware.
Let {a.h;a.t} represent a double-double value a intended to contain the value
a.h+a.t in infinite precision, and {b.t;b.t} a similar double-double value b.
SUM
Consider computing the sum {z.h;z.t}, a double-double value z
intended to contain {a.h+a.t}+{b.h+b.t} as closely as possible.
Clearly z should be a.h + a.t + b.h + b.t, but this is might be
a higher precision value that has to be rounded to fit double-double.
Using the notation
{x.h;x.t} = v.h+w.h # augmul
to mean something like
struct x daug_t ;
{x.h;x.t} = augadd(v.h, w.h) ;
Let the algorithm be
# infinite result is a.h+a.t+b.h+b.t
{u.h;u.t} = a.h+b.t # augadd
# infinite result is u.h+u.t+a.t+b.t
{v.h;v.t} = a.t+b.t # augadd
# infinite result is u.h+u.t+v.h+v.t
{w.h;w.t} = u.t+v.t # augadd
# infinite result is u.h+v.h+w.h+w.t
{x.h;x.t} = v.h+w.h # augadd
# infinite result is u.h+x.h+x.t+w.t
{z.h;z.t} = u.h+x.h # augadd
# infinite result is z.h+z.t+x.t+w.t
Now {z.h;z.t} is a pretty good approximation to the intended result,
with error x.t+w.t, and is
commutative and has no conditional branches. Its edge cases might be no
worse than those of any other double-double implementation fast enough to
be useful. Its performance might be acceptable is hardware
augadd is no slower than two adds.
PRODUCT
Consider computing the product {z.h;z.t}, a double-double value z
intended to contain {a.h+a.t}*{b.t;b.t} is closely as possible.
Clearly z should be a.h*b.h + a.h*b.t + a.t*b.h + a.t*b.t but this is
likely to be a quad-double precision value that has to be rounded to
fit double-double.
Using the notation
{x.h;x.t} = a.h*b.h # augmul
to mean something like
struct x daug_t ;
{x.h;x.t} = augmul(a.h, b.h) ;
Let the algorithm be
# infinite result is
# a.h*b.h+a.h*b.t+a.t*b.h+a.t*b.t
u=a.h*b.t ; v=a.t*b.t # regular multiplication
w = u + v # regular addition
# infinite result is
# a.h*b.h+w+(a.h*b.t+a.t*b.h+a.t*b.t-w)
{x.h;x.t} = a.h*b.h # augmul
# infinite result is
# x.h+x.t+w+(a.h*b.t+a.t*b.h+a.t*b.t-w)
{y.h;y.t} = x.t+w # augadd
# infinite result is
# x.h+y.h+y.t+(a.h*b.t+a.t*b.h+a.t*b.t-w)
{z.h;z.t} = x.h+y.h # augadd
# infinite result is
# z.h+z.t+y.t+(a.h*b.t+a.t*b.h+a.t*b.t-w)
Now {z.h;z.t} is a pretty good approximation to the intended result,
with error y.t+(a.h*b.t+a.t*b.h+a.t*b.t-w),
and is
commutative and has no conditional branches. Its edge cases might be no
worse than those of any other double-double implementation fast enough to
be useful. Its performance might be acceptable is hardware augmul and
augadd are no slower than two muls or two adds.
More information about the Cfp-interest
mailing list