comp.arch.arithmetic posting on div/sqrt

Tue Sep 26 22:04:44 PDT 1995

From: pgsacs.cornell.edu (Peter Soderquist)
Newsgroups: comp.arch,comp.arch.arithmetic,comp.dsp
Subject: Re: Square root algorithm
Date: 26 Sep 95 16:05:05 GMT
Organization: Cornell University

As others have mentioned before, the two main methods of performing
square root in current CPU's are subtractive (e.g. SRT, shift-and-add)
and multiplicative (e.g. Newton-Raphson method, Goldschmidt's
algorithm).  Not coincidentally, these are the two main division
methods as well.  The two are a good match and typically go together 
in hardware.

I have written a number of papers examining the implementation issues
of subtractive and multiplicative floating-point division and square
root, focusing on area and performance tradeoffs.  These papers also
provide introductions to the different algorithms and implementations.
All of them are available from

http://orac.ee.cornell.edu/unit1/pgs

as gzip'ed PostScript files.  I recently presented some of this work
at the 12th IEEE Symposium on Computer Arithmetic in Bath, England, so
the proceedings are another possible source.

To summarize my basic conclusions, subtractive divide and square root
provide the possibility of efficient parallel implementation with
other arithmetic functional units.  This capability for concurrent
operation provides real performance advantages over multiplicative
implementations, which serialize multiply, divide, and square root
operations.  Multiplicative designs also have to contend with the
potentially sticky issue of last-digit rounding error.  In addition
subtractive methods provide a superior balance of area expenditure and
performance.  Finally, I feel that subtractive methods are well-suited
to emerging trends in microprocessor implementation, such as decoupled
superscalar designs and high instruction issue rates.

In my opinion, software implementations of floating-point square root
- or divide - are not even in the running.  Except for code which
performs a very small number of these operations (or none), slow
implementations, in hardware too, can have a performance impact well
out of proportion to their frequency of occurrence.

Pasquale Corsonello <pascoraccusc1.unical.it> wrote:
>I am looking for informations about the algorithm that some processor chips
>such as 80387 or 80486 used to perform square root calculation.
>Tanks, Pasquale

I am not sure about the method used specifically in the 387 and 486;
however the Pentium literature I have seen implies that the 486 uses a 
1-bit shift-and-add algorithm.

Peter S.

-----------------------------------------------------------------------
			 Peter G. Soderquist
		  317 Engineering and Theory Center
	       Cornell School of Electrical Engineering
			  Cornell University
			  Ithaca, NY  14850
			 Ph:  (607) 255-0314
			 Fax: (607) 255-9072
			  pgsacs.cornell.edu
		 http://orac.ee.cornell.edu/unit1/pgs
-----------------------------------------------------------------------