[Cfp-interest 2851] Re: CFP review of TS-4 and TS-5 revisions

Thu Aug 24 20:29:04 PDT 2023

I regret sending a half-baked text which revealed numerous typos as soon 
as I sent it.

The following might be as much as 51%-baked -

EXAMPLE

The augmented operation are useful for extending precision, particularly
for implementing "double-double" arithmetic, which provides a faster, though
less precise and less predictable, alternative to software quadruple precision,
on systems which lack hardware quadruple precision.

Double-double represents numbers as a pair of doubles, the second much smaller
in magnitude than the first.    Ideally, in a pair (a.h,a.t), a.h would
contain the correctly rounded result of computed(a.h+a.t), 
and a.t would contain the correctly rounded result of computed(a.h+a.t)-a.h.
Performance considerations often compromise this ideal.

The augmented operation make it possible to implement + - * for
double-double format, in a way that is commutative and avoids conditional
branches, yet still yields close to the intended results, and is competitive
in performance if supported in hardware.

Let {a.h;a.t} represent a double-double value a intended to contain the value
a.h+a.t in infinite precision, and {b.t;b.t} a similar double-double value b.

SUM

Consider computing the sum {z.h;z.t}, a double-double value z
intended to contain {a.h+a.t}+{b.h+b.t} as closely as possible.

Clearly z should be a.h + a.t + b.h + b.t, but this is might be
a higher precision value that has to be rounded to fit double-double.

Using the notation
	{x.h;x.t} = v.h+w.h         # augmul
to mean something like
	struct x daug_t ;
	{x.h;x.t} = augadd(v.h, w.h) ;

Let the algorithm be

				# infinite result is a.h+a.t+b.h+b.t
	{u.h;u.t} = a.h+b.t	# augadd
				# infinite result is u.h+u.t+a.t+b.t
	{v.h;v.t} = a.t+b.t	# augadd
				# infinite result is u.h+u.t+v.h+v.t
	{w.h;w.t} = u.t+v.t	# augadd
				# infinite result is u.h+v.h+w.h+w.t
	{x.h;x.t} = v.h+w.h	# augadd
				# infinite result is u.h+x.h+x.t+w.t
	{z.h;z.t} = u.h+x.h	# augadd
				# infinite result is z.h+z.t+x.t+w.t

Now {z.h;z.t} is a pretty good approximation to the intended result, 
with error x.t+w.t, and is
commutative and has no conditional branches.    Its edge cases might be no
worse than those of any other double-double implementation fast enough to 
be useful.    Its performance might be acceptable is hardware 
augadd is no slower than two adds.

PRODUCT

Consider computing the product {z.h;z.t}, a double-double value z 
intended to contain {a.h+a.t}*{b.t;b.t} is closely as possible.

Clearly z should be a.h*b.h + a.h*b.t + a.t*b.h + a.t*b.t but this is 
likely to be a quad-double precision value that has to be rounded to 
fit double-double.

Using the notation
	{x.h;x.t} = a.h*b.h         # augmul
to mean something like
	struct x daug_t ;
	{x.h;x.t} = augmul(a.h, b.h) ;

Let the algorithm be

				# infinite result is 
				# a.h*b.h+a.h*b.t+a.t*b.h+a.t*b.t 
	u=a.h*b.t ; v=a.t*b.t	# regular multiplication
	w = u + v		# regular addition
				# infinite result is 
				# a.h*b.h+w+(a.h*b.t+a.t*b.h+a.t*b.t-w)
	{x.h;x.t} = a.h*b.h	# augmul
				# infinite result is 
				# x.h+x.t+w+(a.h*b.t+a.t*b.h+a.t*b.t-w)
	{y.h;y.t} = x.t+w	# augadd
				# infinite result is 
				# x.h+y.h+y.t+(a.h*b.t+a.t*b.h+a.t*b.t-w)
	{z.h;z.t} = x.h+y.h	# augadd
				# infinite result is 
				# z.h+z.t+y.t+(a.h*b.t+a.t*b.h+a.t*b.t-w)

Now {z.h;z.t} is a pretty good approximation to the intended result, 
with error y.t+(a.h*b.t+a.t*b.h+a.t*b.t-w),
and is
commutative and has no conditional branches.    Its edge cases might be no
worse than those of any other double-double implementation fast enough to 
be useful.    Its performance might be acceptable is hardware augmul and
augadd are no slower than two muls or two adds.