No subject

Russell L. Carter rcarteraconsys.com
Fri Feb 21 10:38:27 PST 1997


Subject: Re: Even more on Java numerics 
In-reply-to: Your message of "Thu, 20 Feb 1997 22:25:24 PST."
             <9702210625.AA28919avalidgh.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
--------

Hi David,
I ordinarily prefer not to get embroiled in numerics issues but due
to the way numerical analysis arguments are being used to support Sun's
Java floating point model I feel compelled to respond.  In the following,
one should not assume that I am an Intel partisan; certainly some at
Intel would be surprised to hear me described that way.  While the 
first part addresses peripheral issues, near the end some 
perhaps interesting numerics issues surface.

Also, I'm cc'ing this to the java-for-cse list because I think it is
important to understand systems issues when discussing
java's scientific computing performance.

David wrote:

> 
> Coonen's comments having propagated to another mailing list, java-for-cse,
> I posted a reference to my response at http://www.validgh.com/java, which
> in turn elicited a reply by Russell Carter.    Some of his points (with >)
> may be of interest to numeric-interest, as well as my comments (no >):

And I responded:

> > 
> > Taking the long view, it is likely that Sun will find that the market will
> > settle these issues.  It's a nice try at hamstringing wintel/IBM, but
> > I don't think it will work.
> 
> As the founders of Java have observed, the original target was embedded
> processors, of which Intel and IBM have plenty that would not find the
> Java language definition an impediment.  Several chip companies are working on
> Java chip implementations besides Sun, so there's no particular advantage to
> any one of them.    In any event the Java language
> definition does not cause Pentium or PowerPC chips to "run SLOW" compared to
> other architectures, although Pentium and PowerPC might run slower than
> they would if unconstrained.    The major performance issues with these chips
> are that 

No numerics here, but I note:

1.  The dragon that is attempting to avoid being skewered by your spear does
    not care one whit that the forge you used to smith that spear originally 	 
   produced horseshoes.

2.  Future chip designs are irrelevant to the market as it exists for
    java today.  Some might wish otherwise, but that is not the way the
    market works.
> 
> 1a) The x86 architecture is rather constrained by only eight floating-point
> registers; whatever is done to match load/store bandwidth to that constraint
> in general,

1.  In practice, what does "rather constrained" mean?

    We all know benchmarks are evil, but perhaps here an
    examination of the SPEC benchmark database, and a market survey on 
    complete system prices would be in order.  Those who have done such
    a survey consistently find a price/performance advantage of about a
    factor of 2 for x86 floating point performance, at least on the low
    end.  Now exactly how does the x86 architecture impact the actual
    data on sustained performance?  The evidence at hand implies all
    other solutions are suboptimal.

2.  Load/store bandwidth is not the issue.  The number of bits
    stored in registers is the issue.  And *that* is an LCD argument
    for a standard when in fact the denominator is not common.

    Sustained x86 Load/store bandwidth is now in excess of 100 MByte/s, 
    btw.

> will also tend to relieve any performance deficit due to loads/stores inserted
> to get Java-specified rounding.    Indeed we can speculate that P7/Merced
> has already solved this problem, since the shortcomings of the x86 architecture

To the extent that P7/Merced does not implement solutions with nearly 
complete compatibility to the existing x86 instruction set it will 
be ignored.  It will have in all likelihood the same success as MMX.
It will not "solve" the problem wrt the overall composition of hardware
hosting java machines.

>1b) The x86 architecture doesn't seem to have produced many compilers robust
> enough for scientific programming; none of those I have tested can compile and
> run UCBTEST and the LAPACK test and timing programs at maximum optimization
> levels, and problems are common at lower optimization levels too.    

Hmm, I've run a lot more x86 code through LAPACK than just the test and timing
problems and have never had a problem.  This is using various versions of
gcc on various PC unices.  I did have a problem with some of the eigenvalue
test routines, but since I didn't need them, I ignored them.  Maybe there's
an accuracy scandal lurking there?  I sorta doubt it.  

As for whether or not x86 is suitable for scientific programming, I guess
an easy metric would be $ volume for sales by say, FEM and CFD commercial
software.  The market (again) indicates that x86 works just fine.

> This suggests to me that, although x86 floating-point units may be
> the most common in number, 
> they are not particularly predominant among systems used primarily for
> performance-sensitive floating-point computation.

If you mean here that most people are not trying to run so-called
Grand Challange problems on PCs, we get to agree :)

On the other hand, a walk through any National Lab or graduate school
for that matter might be a bit depressing to an x86-phobe these days.

> 
> And the net of these considerations is that not many users of these computers
> would notice these alleged performance deficits due to Java's definition,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The performance deficit is not "alleged".
It is real, and though the reasons for it may have been originally 
innocuous, they are now being used as a device to significantly 
reduce the floating point performance of 95% of the available 
systems on which to run Java.  All's fair in the market of course, and 
if Sun can pull this gates off more power to them.

> and there are much more fruitful avenues to obtaining higher numerical
> performance on systems built of these chips that would yield greater 
> improvements than relaxing the definition of Java. 

To be precise, fruitful to Sun's hardware sales.  Only.

> 
> > 2. As a practical matter the fused multiply operations on Power2 CPUs,
> >    and the extended precision registers found in x86, have had essentially
> >    NO discernible effect on the reproducible accuracy of nearly 
> >    all practical scientific calculations.
> 
> It's certainly true that a lot of common scientific code, that runs
> acceptably on IBM 370, DEC VAX, Crays, and IEEE 754 systems of various
> flavors, is not much affected by arithmetic details because all such effects
> have been forcibly suppressed by great programming 
> effort over many years.   The focus
> of Java is to get the same effective portability without such great effort. 

The phrase "programming effort" is more accurately replaced by 
"increased use of stable algorithms" by people pushing the boundaries of 
computation.  That is language (and platform) independent.  In a
former life, I used access to a variety of computer arithmetic systems
to explore stability of numerical algorithms.  The actual 
number of algorithms that are sensitive to the differences in available
arithmetic AND are numerically sound is analagous to a set of measure 0.
They exist, but their rarity makes them priceless to a Kahan.

Here we flirt with fundamental issues of computability.

The desire for all programs to produce results identical to the ulp
on all machines is essentially equivalent to the wish to have the 
fractionally insignificant number of numerically broken programs 
run identically on all machines, whether or not additional 
bits can be (accurately) computed.  Should 95% of the people on the 
planet either have their current Java machine run slower by at 
least a factor of 2 or replace their current machine with Sun 
approved hardware in order to make sure that broken programs 
do not arouse suspicion by running (slightly) differently on 
different available platforms?

Those who know x86 performance are probably wondering why I only quote
a factor of 2.  You're right, it's likely a bigger hit than that.

> 
> >    In fact, I would go so far as to say that Java numerical results are
> >    going to differ across platforms a lot more from the immaturity of the
> >    output formatting code than from any adherence to individual CPU
> >    architectural enhancements.
> 
> These differences are bugs and will be so handled.    Unlike other languages,
> Java will not permit hardware bugs, optimizer errors, and library mistakes
> to masquerade as differences in roundoff or expression evaluation.    
> Write once, test once, run anywhere is the goal.
> 
> That's a simplification of course; you still need to test Java programs on
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

No kidding!  A competent test on the floating point formatting accuracy 
of Java I/O routines in order to ensure identical results on all 
java machines will be hard to require in a branding campaign.

> each platform, if you are paranoid about those hardware bugs, optimizer errors,
> and library mistakes, 
> but a satisfactory error analysis on one platform will hold
> for all, and discrepancies can be usefully analyzed because they do mean
> that something is wrong and there is no likelihood that they will ultimately
   ^^^^^^^^^^^^^^^^^^^^^^

With the java machine, only.  No way to tell about the program.  By
definition the program runs the same everywhere.


> have to be tolerated because somebody rightly or wrongly concluded that 
> the discrepancies were "just roundoff".
                    ^^^^^^^^^^^^^^^^^^^

;-)  Now, we can kill off most of the numerical analysts.  Derive the
error analysis and we're done with accuracy issues.

Cheers,
-- 
Russell L. Carter

Voice:(520) 636-2600 FAX:(520) 636-2888          rcarteraconsys.com
Conceptual Systems & Software,  P.O. Box 1129 Chino Valley AZ 86323
"Before sitting down, always look for ferrets."





More information about the Numeric-interest mailing list