SPECfp92

Mon Aug 22 22:14:41 PDT 1994

I've got a question for end users who take SPEC ratios into consideration
when purchasing PC's or workstations.   First some background:

The test programs in SPECfp92 have been criticized for being too simple,
and unrealistic representations of real applications.     One way to address
that would be to add more realistic applications to SPECfp92, but most
realistic applications involve i/o performance (because of large input
or intermediate or output data) and graphics performance.    Since 
system performance on big problems may involve all of these factors,  
adding large problems to SPEC might be viewed as a good thing. 

Another point of view is that SPECint92 and SPECfp92 are not intended to
be system-level performance tests, but rather measurements of "CPU"
capability.   Well not exactly: what they measure is a function of
CPU chip(s), cache, compilers, and libraries, so it's not really pure
CPU, but i/o, graphics, and system overhead in general are not important.
Large test programs that might be CPU-intensive on large systems
might become I/O-intensive on small systems.     So adding large programs
to SPECfp92 might confuse more than illuminate.    In Unix terms,
SPEC programs should be mostly user time on the types of "small" 
PC's and workstations that are used for entry-level technical computing.
As of 1993 I would estimate that comes out to be about 16MB RAM and 256KB
cache, subject to ongoing increase.

Another approach would be to split the floating-point benchmarks into
two groups.   One would be restricted to programs that are CPU-intensive on
"small" systems, however defined, and the other would be unrestricted and
thus might encompass significant I/O and graphics, which might be bottlenecks
on small systems and perhaps not on larger systems.    Programs whose 
working sets are larger than 32MB, such as perfect.mg3b, are an example.
SPECfp92 would be defined over the "small" benchmarks, but no mean would
be published for the large problems,  since their performance varies so much
more according to the configuration of the test system.

So the question is:

What would be the most useful approach for you as an 
end user evaluating computer hardware for purchase?   What changes, if any,
would make SPEC results more usefully relevant to you?    Do any of the
foregoing ideas appeal to you - if not, what should be done instead?