Floating Point Instructions

uunet!cwi.nl!Dik.Winter uunet!cwi.nl!Dik.Winter
Fri Jan 10 19:12:23 PST 1992


How right you are.  I have no answers to all your questions, and the answers
I give are just opinions.  But here we go.

 >     There is a trend in RISC microprocessors to add
 >     additional FP instructions for Square Root, 
 >     Inverse Square Root, ...  Are these instructions 
 >     justified ?
IEEE specifies Square Root. I ought not to be slower than two divisions
using the proper algorithms (of course, getting rounding correct may be
a bit tricky).
 >   
 >     There has been the old "rule of thumb" that :
 >       - Add, Subtraction, Multiplication had
 >         to have equal latencies.
 >       - Division latency had to be 3-4 times slower 
 >         than Multiply/Add/Subtract.
 >       - Square Root latency had to be 3-4 times slower 
 >         than Division.
I do not know how old those "rule of thumb" are.  Not very old.  I still
remember (but I have a long memory) the machine where a multiply took
just as long as a division, 58 times the add/subtract.  We though it was
fast!  Winograd has described an algorithm to do matrix-matrix multiply
reducing the number of multiplications but adding to the number of
additions/subtractions.  If I remember my calculations right, his
implementation would be better if the multiplication was worse than nine
times the add/subtract.  You hardware guys are just invalidating old,
well-established, research!  ;-)  But given that with enough hardware
the addition takes order ln(n) time (with n the number of bits), and
multiplication is the same order, I will accept the same orer of latency
for different precisions.
 >     Assuming performance is relative to occurance,
 >     do real-world programs really reflect this ?
Most real-world programs avoid division and square root like the plague.
Surprising, as on the late seventies machine I was talking about a
division took just as long as a multiplication.

 >   2)  Alignment & Normalization 
No comment.

 >   3)  Precision
 >     In a similiar vein, are there any studies concerning
 >     the relative frequencies of double and single 
 >     precision ?  I thought SPUR was only double precision,
 >     but I never heard any conclusion as to whether this 
 >     is a good idea.
With the current speeds you can attack larger problems, but larger problems
require in general better base precision.  So going to only double precision
is not a bad move.

 >   4)  Denormals
 >     Some microprocessors are handling denormal inputs in
 >     hardware.  Is this justified ?
 >     There are well-known programs that exhibit underflow,
 >     but how often does it occur ?  Is the usual Flush-to-Zero
 >     mode a real impact on the "typical" mix of scientific
 >     programs ?
You can do Flush-to-Zero and most programs will have no problem with that.
But there are quite a few programs that get denormal results; and if they
occur it is in general very frequent.  I like the Amd 29050 approach where
the generation of denormals just takes a few cycles longer, but is
otherwise done in hardware.  When it is done in software using traps, once
you encounter denormals the performance plummets.  On the other hand, the
RS6000 approach is no good for infinities.  You want to be able to let
your program trap on infinities without a serious performance degradation
for the overall program.  My opinion:  when encountering denormals, do
it in hardware; just stall the pipes a few cycles.  Perhaps a flag
enabling flush to zero is a good idea.  When encountering other excentrics
enable immediate trapping without other performance loss.


 > OPINIONS :
 >   - I would be very surprised if divisions comprised
 >     more than 10% of a mix of scientific programs.
Right.  Most numeric mathematics programmers know to avoid division (although
the numeric mathematicians do not know that).  On vector processors also,
division is much slower than multiplication by the inverse.

 >   - Accordingly, the performance of the Division and
 >     Square Root can be reduced appropriately.
Right.  Although the Square Root ought to have about the same latency as the
Division.  (I think they occur about equally often.)

 >   - I think FP instructions should be optimized for
 >     short alignment/normalization shifts, and the
 >     longer shifts should take more time.  Handling
 >     LZA for double precision is difficult with respect
 >     to time-space tradeoffs.  Handling LZA for quad
 >     precision may preclude adoption of it by designers.
Again right.  Long shifts in are an indication for loss of precision, so
in general something in the program is not working as it should.

 >   - We should spend more hardware on handling denormals
 >     than on handling division and square root instructions. 
Right.  See the Amd 29050.

 >   - We need an additional program in the SPEC benchmark
 >     that is concerned with the accuracy of floating
 >     point computations as well as the speed.  It galls 
 >     me to see some microprocessors with high SPEC marks 
 >     that are notoriously inaccurate.
If micro's are inaccurate they ought to fail if they pretend to be IEEE
compliant.  Of course, if they do not pretend that all bets are off (witness
Cray).

 >   - We need additional programs in the SPEC benchmark
 >     that show computations in application specific disciplines,
 >     including graphics, finite-element analysis, etc ...
 >     Are these disciplines any different in terms of floating
 >     point computations ?
Yes, I think graphics are different.  In graphics you (more or less) want
to use fixed point arithmetic.  If you want to target for graphics operations,
fast sine and cosine using the Cordic algorithm would be excellent.  Of course
the Cordic algorithm does not give the relative accuracy required in numeric
mathematics, but in graphics you are concerned with absolute accuracy!

Of course, some of the categorical responses I gave aer to be taken with a
grain of salt.

dik
--
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland
dikacwi.nl



More information about the Numeric-interest mailing list