Floating Point Instructions
Bill Voegtli
uunet!mips.com!voegtli
Fri Jan 10 13:12:52 PST 1992
This newsgoup has quieted down; and as a hardware
guy, the long long vs int64 discussions just doesn't
grab me, so .....
I have some questions concerning floating point
operations and would appreciate comments.
1) Instruction Counts
Throughout computer history, there have been numerous
studies of instructions and their relative frequencies.
Have there been any studies of the relative
frequencies of just floating point operations ?
There is a trend in RISC microprocessors to add
additional FP instructions for Square Root,
Inverse Square Root, ... Are these instructions
justified ?
There has been the old "rule of thumb" that :
- Add, Subtraction, Multiplication had
to have equal latencies.
- Division latency had to be 3-4 times slower
than Multiply/Add/Subtract.
- Square Root latency had to be 3-4 times slower
than Division.
Assuming performance is relative to occurance,
do real-world programs really reflect this ?
2) Alignment & Normalization
In 1965, Sweeney (of SRT fame) analysed a bunch
of programs to determine the frequency of alignment
and normalization shifts AND the distances. This
work influenced the hexadecimal radix of the 360
family. Are there any more recent studies ?
3) Precision
In a similiar vein, are there any studies concerning
the relative frequencies of double and single
precision ? I thought SPUR was only double precision,
but I never heard any conclusion as to whether this
is a good idea.
4) Denormals
Some microprocessors are handling denormal inputs in
hardware. Is this justified ?
There are well-known programs that exhibit underflow,
but how often does it occur ? Is the usual Flush-to-Zero
mode a real impact on the "typical" mix of scientific
programs ?
OPINIONS :
- I would be very surprised if divisions comprised
more than 10% of a mix of scientific programs.
- Accordingly, the performance of the Division and
Square Root can be reduced appropriately.
- I think FP instructions should be optimized for
short alignment/normalization shifts, and the
longer shifts should take more time. Handling
LZA for double precision is difficult with respect
to time-space tradeoffs. Handling LZA for quad
precision may preclude adoption of it by designers.
- We should spend more hardware on handling denormals
than on handling division and square root instructions.
- We need an additional program in the SPEC benchmark
that is concerned with the accuracy of floating
point computations as well as the speed. It galls
me to see some microprocessors with high SPEC marks
that are notoriously inaccurate.
- We need additional programs in the SPEC benchmark
that show computations in application specific disciplines,
including graphics, finite-element analysis, etc ...
Are these disciplines any different in terms of floating
point computations ?
--
--
UUCP: {ames,decwrl,prls,pyramid}!mips!voegtli (or voegtliamips.com)
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086
Standard Disclaimer : I speak from MIPS, not for MIPS.
More information about the Numeric-interest
mailing list