questions and comments about gradual underflow

David G Hough at validgh validgh
Sun Dec 5 08:15:40 PST 1999


I received the following:


> Recently we have been revisiting the tradeoffs between
> gradual underflow and "flush to zero". Of course, we
> haven't been debating which is the computationally better 
> choice but rather the realities of the market. And we
> haven't been debating gradual underflow support only
> how much resource to dedicate to make it perform well.

> Unfortunately, it seems that those of us doing computer
> architecture have not done a good job of finding ways
> to implement gradual underflow without imposing
> a large performance hit. (It is a very difficult
> problem in high speed superscalar pipelines where a 
> potential underflow may need to cause a pipeline stall, 
> flush, etc.

> Therefore, it seems that for performance and compatibility
> reasons most ISV's have favored FTZ over gradual underflow.

> And once the ISV's (and most other customers) start
> coding for FTZ, then it gets harder and harder to sell
> the architects and chip implementers that they should 
> spend a lot of time making gradual underflow faster.

> I was wondering if you agree with my arguments above?
> Do you think most ISV's now use FTZ for the great majority
> of their code?  Do you think the future trend will be for
> the performance difference between FTZ and gradual underflow
> to get larger as machines get faster?



I always thought of underflow to subnormal as like a page fault,
relatively rare but not so rare that performance can be completely ignored.

The current approach at Sun for ultrasparc chips is much like that in
the first SPARC systems from 1987:
gradual underflow is not supported
in hardware; operations which get subnormal operands or results cause
incomplete fp traps to the kernel which emulates the instructions invisibly
to the user, except for performance in programs that have a lot
of subnormal operands or results.     Various previous supersparc and
microsparc chips did subnormal support entirely in hardware.
Current ultrasparc chips do not trap for gross underflows all the way to
zero; these are performed in hardware and this cures 90% of the performance
problems due to gross underflow in multiplication, compared to the 
situation in 1987.

If you notice the performance anyway on a particular program, by a
compile-time option or run-time function call you can enable a
bit in the SPARC %fsr register for nonstandard mode that causes 
subnormal operands and results to be interpreted as zeros, in hardware with no 
traps to software.     
Being nonstandard, the exact definition of this mode 
varies from system to system; that's why it's called nonstandard.
On supersparc and microsparc systems that handled subnormal operands and results
completely in hardware, the nonstandard bit was a noop.

As for ISV's, as with other compile-time optimization switches, 
ISV's investigate nonstandard mode after they encounter a performance problem.
Most never do.  Certainly nobody should enable nonstandard mode
for portability, although ISV's that have ported applications from
VAX, IBM mainframes, and Crays have no common expectations about exception
handling and so exploit nothing specific to IEEE 754 hardware.

As for the future, there seems to be oscillation between putting as little as
possible in hardware, to maximize clock rate, vs. putting doing it all
in hardware, to minimize performance anomalies.    The extreme case here
has been DEC alpha systems, which as far as I understand, 
can't provide correct IEEE exception handling without significantly 
slowing down programs even when no exceptions are encountered.

Sun's business has certainly evolved in the 15 years I have been working there.
I started working on technical workstations for people who were willing and 
able to learn the tools they were using and adapt them to their own ends, 
and who expected standards to be implemented to avoid being tied to a particular
platform.     Nowadays Sun's primary business has evolved toward bigger and 
bigger mainframes (now called "servers") with dumber and dumber terminals, 
and end users with neither 
desire, ability, nor permission to adapt the terminals to their own purposes.
And this mainframe-like business is fiercely contested by Intel and IBM and 
others whose long-term viability is less certain.

Marketing supercomputers to mainframe computing centers
is often quite different from marketing technical 
workstations to technical end users.   
Prospective mainframe supercomputing customers often have years of experience
dealing with Crays and other exotic hardware and instead of being shocked
if vendors take hardware shortcuts such as omitting gradual underflow
support, or if compilers cheat on benchmarks, 
these customers are shocked if vendors don't - 
failing to take the trouble to cheat 
signifies lack of commitment to the market, in some circles.





More information about the Numeric-interest mailing list