even more on exceptions from comp.arch/comp.benchmarks

Thu Jan 13 08:48:15 PST 1994

For those of you who aren't able to keep up with USENET volume, some
corrections and extrapolations based on my recent posting:

Article: 3849 of comp.benchmarks
From: neideckabier.kar.dec.com (Burkhard Neidecker-Lutz)
Newsgroups: comp.arch,comp.benchmarks
Subject: Re: IEEE 754 traps, hardware traps, and performance
Date: 12 Jan 94 07:58:24 GMT
Organization: CEC Karlsruhe

In article <645avalidgh.com> dghavalidgh.com (David G. Hough at validgh) writes:
>1) Current ALPHA chips, like most RISC CPU's, 
>do not handle subnormal operands
>and results, and causes a hardware trap to the kernel.

Some hardware handling is there.

>   However, presumably
>due to a rush to market, DEC's operating system kernels supporting ALPHA do not
>recompute the correct IEEE 754 subnormal result, providing zero instead; this
>is supposed to be fixed in future releases, and thus is not a permanent
>phenomenon.

I don't think we shipped a version of OSF/1 other than field test base
levels that didn't do this correctly. Here's the last few lines of output
of the Paranoia floating point test suite:

	cc -ieee_with_no_inexact paranoia.c -lm

	...
	No failures, defects nor flaws have been discovered.
	Rounding appears to conform to the proposed IEEE standard P754,
	except for possibly Double Rounding during Gradual Underflow.

So, with the correct compiler options, DEC OSF/1 on AXP does fully support
IEEE 754.

> Of course, from the point of view of DEC's migrating
>VAX customers, that doesn't matter since their codes have always operated 
>without subnormal operands or results.

On VMS, IEEE support isn't yet complete (which doesn't matter because
people coming from VAX/VMS are using VAX floating point anyways).

		Burkhard Neidecker-Lutz

Distributed Multimedia Group, CEC Karlsruhe  
BERKOM II Project
Digital Equipment Corporation
neideckanestvx.enet.dec.com

Article: 3852 of comp.benchmarks
From: wlwafc.hp.com (Will Walker)
Newsgroups: comp.arch,comp.benchmarks
Subject: Re: Alpha and IEEE exceptions
Date: 12 Jan 94 15:50:34 GMT
Organization: Hewlett-Packard Fort Collins Site

Dave Wagner (davewacray.com) wrote:
> wlwafc.hp.com (Will Walker) writes:
> >Full IEEE trap checking does not slow down a PA machine.
	[ deleted description of how FP exceptions work on PA ]
> Which is exactly what Alpha does too.  The 'slow' part is if you have
> lots of traps, say NaN operands (even Quiet ones), the h/w will trap
> and the OS has to provide the fixup.  If you don't care about those
> intermediate denormal/NaN results, you can check a status register
> at the end (after a TRAPB instruction) to see if you got caught anywhere.

Stephen Westin implied that PA machines run slower when doing full
IEEE FP exception checking.  I responded to point out that this is
false.  Dave Wagner then implied that Alpha processors, like PA
processors, suffer no slowdown for checking exceptions.

There is a big difference.  According to the Alpha Architecture
Handbook, if you want to be able to recover from FP exceptions then
you have to insert a TRAPB instruction after every FP arithmetic
instruction.  Inserting TRAPB instructions will slow down your program
even if there turn out to be no exceptions.  Most other architectures
provide precise FP exceptions with no performance penalty.

If you know you will have no exceptions, or if you just want your
program to crash if it has an exception, then you can leave the TRAPB
instructions out and you will run at top speed.  However if you cannot
be sure about exceptions at compile time, and you want your program to
recover from exceptions and continue execution, then you need TRAPB's
and you will run slower.

Is this not true?

- Will Walker

Article: 3853 of comp.benchmarks
From: wlwafc.hp.com (Will Walker)
Newsgroups: comp.arch,comp.benchmarks
Subject: Re: IEEE 754 traps, hardware traps, and performance
Date: 12 Jan 94 17:10:37 GMT
Organization: Hewlett-Packard Fort Collins Site

David G. Hough at validgh (dghavalidgh.com) wrote:
> Unlike item 1) above, however, imprecise floating-point traps seem more
> likely than not to prevail in future high-performance CPU designs.

I'm curious about this conclusion & would like to spur some
discussion.

I believe the trend toward aggressive FP circuits with shorter
latencies will relieve some of the pressure to switch to imprecise
exceptions.

What is the cost of precise exceptions?  It is hardware complexity,
proportional to the number of outstanding dependent FP instructions
you can have at one time.

In a traditional pipelined implementation if your flop latency is much
shorter than your instruction pipeline then precise exceptions are
easy: when you detect an exception you can take the trap immediately,
before any source operands are overwritten.  If your flop latency is
longer than your instruction pipeline then you have to prevent
subsequent dependent instructions from completing until you are sure
there is no exception.  This can be (and is) done by queuing up flops.
When an exception occurs the entire queue is presented to the trap
handler, which can then resolve the exception in a precise way.

If your FP latency gets shorter or your instruction pipeline gets
longer then you will have fewer oustanding dependent flops at a time.
Your queue gets shorter and less complex.

Current processor implementation trends (multiple issue, out-of-order
execution, speculative execution) bring pressure to have more and more
outstanding dependent flops at one time.  But what is a dependent
flop?  If flop B uses flop A's result then B cannot begin until A is
finished.  Supporting precise exceptions for this case is not
difficult -- if we are not queuing our flops outside the pipeline then
we can trap flop B while it is still in the pipe.  If we are queuing
flops for execution then it's not much extra work to freeze the queue
on an exception and present it to the trap handler.

The real problem is WAR/WAW: when flop B just overwrites flop A's
operands or result.  Now precise exceptions are the only reason to
prevent B from completing before A.  In this environment we need
register renaming to keep precise exceptions cheap.

- Will Walker

Article: 3854 of comp.benchmarks
From: huckansa.hp.com (Jerry Huck)
Newsgroups: comp.arch,comp.benchmarks
Subject: Re: IEEE 754 traps, hardware traps, and performance
Date: 12 Jan 94 17:19:13 GMT
Organization: Hewlett-Packard, Networked Systems Architecture

Burkhard Neidecker-Lutz (neideckabier.kar.dec.com) wrote:
: In article <645avalidgh.com> dghavalidgh.com (David G. Hough at validgh) writes:
: >1) Current ALPHA chips, like most RISC CPU's, 
: >do not handle subnormal operands
: >and results, and causes a hardware trap to the kernel.

: Some hardware handling is there.

: >   However, presumably
: >due to a rush to market, DEC's operating system kernels supporting ALPHA do not
: >recompute the correct IEEE 754 subnormal result, providing zero instead; this
: >is supposed to be fixed in future releases, and thus is not a permanent
: >phenomenon.

: I don't think we shipped a version of OSF/1 other than field test base
: levels that didn't do this correctly. Here's the last few lines of output
: of the Paranoia floating point test suite:

: 	cc -ieee_with_no_inexact paranoia.c -lm

: 	...
: 	No failures, defects nor flaws have been discovered.
: 	Rounding appears to conform to the proposed IEEE standard P754,
: 	except for possibly Double Rounding during Gradual Underflow.

: So, with the correct compiler options, DEC OSF/1 on AXP does fully support
: IEEE 754.

Thanks to David and Burkhard for clarify and clearing up some of these
issues.  I'm still confused on one point - that is the performance
implications of conformance to non-trapping IEEE 754 behavior.

David described the SPARC mechanism that handles trapping,
non-trapping, and "trapping" for non-trapping exceptions.  This
involves some bit of hardware to basically allow an OS to sort out
what went wrong, fix things up, and restart again.  HP's PA-RISC
architecture is basically the same.  In neither case, is code
generation constrained.  Source and target registers can be
immediately re-used with no constraint wrt operand latencies or
barrier instruction placement.

I believe the original thread was asking about the performance
implications of supporting IEEE exceptions.  Will Walker correctly
indicated that there is no performance consequence when you don't have
an exception on PA-RISC.  David's posting indicated the same is true
for SPARC and suggested it might be true of Alpha if the right code
were written for the kernel.  Burkhard indicated that the code is
there and using the right compiler option makes everything work just
fine.

OK, so now the question(s) - no matter what code I write in the kernel,
is it necessary to modify (and potentially de-optimize) instruction
generation to deliver default non-trapping IEEE behavior?  For
example, if I just want to have denorms, NaNs, and infinities possible
in my sources or results, is my code slower (when those numbers don't
occur)?  What is the expectation of the math libraries - can I pass an
Infinity to LOG or EXP?  Are there two versions - one faster than the
other?  Even if I know my data is free of infinities and NaNs can I
taking binary data from a machine that generates denorms (like the DEC
MIPS boxes) and import it into an Alpha program that doesn't have that
special compiler flag set?

The RS/6000 took the strategy that if you want to have 754 trapping
behavior, then you had some special constraints on code generation.
On the other hand, there are no constraints when running with no traps
enabled.

Thanks,
Jerry Huck
Hewlett-Packard

Article: 3855 of comp.benchmarks
From: davewacray.com (Dave Wagner)
Newsgroups: comp.arch,comp.benchmarks
Subject: Re: Alpha and IEEE exceptions
Date: 12 Jan 94 19:11:28 GMT

wlwafc.hp.com (Will Walker) writes:

>Dave Wagner (davewacray.com) wrote:
>> wlwafc.hp.com (Will Walker) writes:
>> >Full IEEE trap checking does not slow down a PA machine.
>	[ deleted description of how FP exceptions work on PA ]
>> Which is exactly what Alpha does too.  The 'slow' part is if you have
>> lots of traps, say NaN operands (even Quiet ones), the h/w will trap
>> and the OS has to provide the fixup.  If you don't care about those
>> intermediate denormal/NaN results, you can check a status register
>> at the end (after a TRAPB instruction) to see if you got caught anywhere.
>
>Stephen Westin implied that PA machines run slower when doing full
>IEEE FP exception checking.  I responded to point out that this is
>false.  Dave Wagner then implied that Alpha processors, like PA
>processors, suffer no slowdown for checking exceptions.
>
>There is a big difference.  According to the Alpha Architecture
>Handbook, if you want to be able to recover from FP exceptions then
>you have to insert a TRAPB instruction after every FP arithmetic
>instruction.  Inserting TRAPB instructions will slow down your program
>even if there turn out to be no exceptions.  Most other architectures
>provide precise FP exceptions with no performance penalty.
>
>If you know you will have no exceptions, or if you just want your
>program to crash if it has an exception, then you can leave the TRAPB
>instructions out and you will run at top speed.  However if you cannot
>be sure about exceptions at compile time, and you want your program to
>recover from exceptions and continue execution, then you need TRAPB's
>and you will run slower.
>
>Is this not true?
>
If you always want to know exactly which instruction "trapped", then yes.
However, you can run your code (which does tons and tons of f-ops) and
check at the end if there were exceptions.  You don't need to trap to
the OS for these, you can just have the Alpha h/w propogate the NaN or
whatever.  After detecting (at the end of the code) that there was an
exception somewhere, branch to a slow version that does have the TRAPBs.

If you have lots of f-ops that you expect usually will cause an
exception somewhere, and you always care where, then yes, you have
a problem with Alpha.  Luckily, this isn't usually the case.

--
Dave Wagner                               "My other computer is a T3D."
davewacray.com , uunet!cray!davew         "Ask me about PWC chapters near
(612) 683-5393                             you (Parents Without Camcorders)"

Article: 3856 of comp.benchmarks
From: groutasp17.csrd.uiuc.edu (John R. Grout)
Newsgroups: comp.arch,comp.benchmarks
Subject: Re: Alpha and IEEE exceptions
Date: 12 Jan 94 20:49:29 GMT
Organization: UIUC Center for Supercomputing Research and Development

In <CJIxCB.B09afc.hp.com> wlwafc.hp.com (Will Walker) writes:

>There is a big difference.  According to the Alpha Architecture
>Handbook, if you want to be able to recover from FP exceptions then
>you have to insert a TRAPB instruction after every FP arithmetic
>instruction.  Inserting TRAPB instructions will slow down your program
>even if there turn out to be no exceptions.  Most other architectures
>provide precise FP exceptions with no performance penalty.

>If you know you will have no exceptions, or if you just want your
>program to crash if it has an exception, then you can leave the TRAPB
>instructions out and you will run at top speed.  However if you cannot
>be sure about exceptions at compile time, and you want your program to
>recover from exceptions and continue execution, then you need TRAPB's
>and you will run slower.

>Is this not true?

No, given a sophisticated enough compiler and debugger.

Here's one possible approach... code compiled in debugging mode takes
checkpoints every so often (doing a TRAPB, then saving intermediate results
somewhere: DEC suggested once per basic block).  If an FP exception is
trapped, the debugger gets control, dynamically restores program status,
builds the specific code sequence to trace the failure one FP instruction at a
time, and executes that sequence of instructions to recreate the failure (this
time precisely).

Another relevant point... DEC didn't conclude that providing precise FP
interrupts _now_ wasn't reasonable: they concluded that it probably wouldn't
be reasonable over the lifetime of the Alpha architecture (and I agree).
--
John R. Grout						j-groutauiuc.edu
Center for Supercomputing Research and Development
Coordinated Science Laboratory     University of Illinois at Urbana-Champaign

Article: 3858 of comp.benchmarks
From: lethinaraisin-scone.ai.mit.edu (Richard A. Lethin)
Newsgroups: comp.arch,comp.benchmarks
Subject: Re: Alpha and IEEE exceptions
Date: 12 Jan 94 22:16:22 GMT
Organization: MIT Artificial Intelligence Laboratory

In article <davew.758401612awillow07> davewacray.com (Dave Wagner) writes:
>If you always want to know exactly which instruction "trapped", then yes.
>However, you can run your code (which does tons and tons of f-ops) and
>check at the end if there were exceptions.  You don't need to trap to
>the OS for these, you can just have the Alpha h/w propogate the NaN or
>whatever.

Looking for potential exceptions by looking at the mandated IEEE status
register is OK, provided that the compiler is not issuing extra speculative
floating point operations to increase performance.

Relying on propagation of NaN can be unacceptable if the program executes
instructions which "destroy" the exceptional information stored in the NaN
-- examples of this are conversion from floating point to integer and
floating point compare operations.

---
Richard Lethin		MIT Concurrent VLSI Architecture Group
lethinaai.mit.edu	545 Technology Square, Cambridge, MA 02139