Comments on Sun's Proposal for Extension of Java Floating Point in JDK 1.2

Thu Aug 13 10:15:45 PDT 1998

Hi Joe,

Some responses to your comments...

> From: "Joseph D. Darcy" <darcyaCS.Berkeley.EDU>
> Subject: Re: Comments on Sun's Proposal for Extension of Java Floating Point in JDK 1.2
> To: Timothy.LindholmaEng, numeric-interestavalidgh.validgh.com
> Date: Wed, 12 Aug 1998 01:23:41 -0700 (PDT)
> 
> PEJFPS = Proposal for Extension of Java(TM) Floating Point Semantics,
> Revision 1
> 
> > Tim Lindholm's response to Samuel Figueroa's comments on PEJFPS
> 
> [snip]
> 
> > As for fused multiply-add, our reading of the proposal and our intent
> > with it is to allow the PPC to use its fused multiply-add in implementing
> > widefp.  If that proves to be false for some reason we haven't understood,
> > we would want to fix that.
> 
> The only explicit mention of fused mac in PEJFPS is on page 30; "For
> instance CPUs with fused multiply and add instructions may be unable
> to use those instructions when implementing FP-strict operations."  It
> would be possible to obliquely interpret PEJFPS sections 15.16 and
> 15.17.2 as allowing fused mac, but it is not clear from the text.
> PEJFPS proposes allowing the programmer to inquire whether or not
> extended formats are in use by testing various variables holding the
> minimum/maximum exponent of the formats used in FP-wide contexts.
> This mechanism does not work well with indicating whether fused mac is
> being used since the extra precision and range of fused mac is
> internal to the instruction.  To more readily indicate fused mac can
> be used, a new "boolean" system property called, say, fp.mac could be
> added to Java.  (System properties are used to indicate
> platform-dependent details, such as the proper line termination
> character sequence; system properties are accessed with the
> System.getProperties method.)  Additionally, if fused mac were to be
> allowed, the conditions under which a successive add and mul can be
> fused should be described.  In other words, the operations that could
> possibly be fused should be inferable from the source code and not
> entirely dependent on the cleverness of the optimizer.

True, the white paper is not at all clear about when fused mac can be
used.  The extended format parameters won't tell you that reliably
either.  But that is consistent with the fact that we intended a code
generator to be free to decide when it wants to use fused mac.  Of
course, like Sam Figueroa proposed, we might provide additional explicit
support for fused mac as well.

It might be good if you'd explain why it's useful to be able to
recognize conditions for fusion in the source.  If you are writing code
that aspires to be at all implementation-dependent, then there can be
no guarantee each implementation will support the optimization.  So
if you really want to be sure you get the optimization, you have to
know your code generator.  Would your algorithms look at fp.mac and
adapt?

> [d. SPARCs as model IEEE 754 compliant processors]
> 
> > All seems pretty true.  To the extent (d) is true, it's not as
> > nepotistic as it might appear in retrospect.  At the time this was
> > decided, Java was Oak and the widely cross-platform Internet play that
> > led to Java hadn't been conceived of yet.  Oak only ran on SPARC and
> > using IEEE 754 was the natural choice there.  New processors were
> > mostly implementing IEEE 754.
> 
> The x86 line of processors conforms to IEEE 754.  Unlike SPARC's, the
> x86 line of processors has hardware support for the IEEE 754
> recommended doubled extended format.  The x86 rounding only the
> significand when rounding to double precision is part of IEEE 754
> (section 4.3).  Since the floating point of the 8087 formed the basis
> for IEEE 754, it would be odd if the x86 processors did not conform to
> the standard.  The SPARCs I'm familiar with have very poor support for
> IEEE 754 mandated subnormals; a single operation on subnormal numbers
> can take 10,000 cycles (about 3,333 times longer latency than an
> operation on normal operands).

FWIW: Please don't take anything I say in this discussion as toeing a
company line on SPARC or denigrating any other processor.  In fact, it
should be clear that a major motivation for the current proposal is to
improve performance on non-SPARC processors, presumably at some cost to
SPARC.  One of the objectives of Java is to make the choice of
processor less relevant than it is today.  Current processors just
provide some of the constraints under which we search for a good
solution for Java.

> SPARCs do implement rounding modes and sticky flags, *required* features
> of IEEE 754 forbidden or omitted by Java (The Java(TM) Language
> Specification, section 4.2.4).

As I said in the response to Sam Figueroa, we are considering supporting
these things in a follow-on proposal.  This proposal needs not be the
last word on anything, but we don't have a window to do everything right
now.

> Since Java is no longer Oak (and hasn't been Oak for several years),
> it is difficult to justify Java's floating point semantics by citing
> constraints inherited from Oak.

Untrue.  Oak became Java only after we began making it available to
people in roughly its current form and concept.  It took off so
quickly that we immediately desired to provide compatibility across
releases.  We had our one opportunity to introduce a discontigous
change, but thereafter we've had the same problems with evolving the
system as everyone else and have to live with the implications of the
early decisions.  One might well argue that inconveniencing 100 or
1000 people then would have been preferable to leaving in a wart or
inconveniencing 100K or 1M, but there wasn't a precedent to indicate
that we'd face such a problem.  Once things got going there wasn't time
to do much more than keep our heads above water.

[snip]

> >  We've been pushed very hard in favor of performance at the cost of
> > predictability.  Note also that making widefp code predictable in
> > this way seems to throw off other things, like the ability to use
> > PPC's fused multiply-add.
> 
> On existing relevant architectures, the one with double extended (x86)
> and the ones with fused mac (PowerPC, et al.) are disjoint.
> Therefore, having somewhat separate rules for using extended precision
> and fused mac in widefp contexts may be workable.  (Unless Merced has
> both features???)

This sounds plausible to me.  Does anybody understand what separate rules
for fused mac might be?  I still think that its use falls out of and is
consistent with the current proposal.  I don't think I can get info as to
whether Merced has both features (anybody know and can say??).

> [snip]
> 
> > The (proposed) JVM Spec would always treat implicitly widefp methods as
> > widefp, although the current proposal permits a lot of rounding in a
> > widefp method.  The model should never be that widefp sometimes means
> > strictfp and sometimes widefp.
> 
> While that may be the intention, it is not mandated by the proposal.

I'd asked this of Sam too: is your concern that in widefp methods the
spec does not mandate that everything stays wide, or is it that on
processors that don't implement IEEE 754 extended formats, that widefp
methods are likely to use non-extended formats, or something else?

Said another way, the intention is that regardless of whether you are
currently running on a processor that implements extended formats,
you'd always write wide code anticipating the use of extended formats.

> [why not allow double rounding on gradual underflow in strict mode?]
>
> > We stewed over this but ended up thinking that we had to stay
> > backward compatible when strict.  We didn't think this small
> > extension would satisfy performance needs (given the store-load
> > costs) but would break our promises and still wouldn't give
> > bit-for-bit.
> 
> Has anyone actually complained about exact reproducibility being
> violated by the double rounding on underflow discrepancy exhibited by
> existing Java VMs on the x86?

I can't differentiate complaints lack of reproducibility from double
rounding on underflow from complains of lack of reproducibility from
other causes.  Certainly our testers have complained about lack of
reproducibility -- this has greatly restricted our ability to test
floating point in the 1.1 line.  Is this necessarily the case?  Maybe
not.  We are developing better testing methodologies in the context of
the current proposal that might be able to deal with testing in the
presence of double rounding.

Compiler writers on x86 did complain when pushed to insert store-load
sequences to get closer to the specs.

> Do Java 1.2's transcendental functions conform to the language
> specification instead of using whatever transcendental functions an
> underlying C compile provides?
> 
> Sun's Java 1.1 JDK uses the transcendental functions of the underlying
> C compiler.  On the x86, C compilers often use the corresponding
> hardware instructions on the chip.  These instructions can give
> different answers than the FDLIBM algorithms mandated by Java (The
> Java(TM) Language Specification, section 20.11).  Moreover, the
> transcendental functions on different x86 chips have been implemented
> using at least 5 distinct set of algorithms; therefore, the same
> instruction with the same argument on different x86 chips can also
> give different answers.  On platforms other than the x86, FDLIBM or
> some other implementation of the transcendental functions may be used.
> The discrepancy between these different versions of the transcendental
> functions is erroneously ascribed to "rounding differences" in a Java
> 1.1.x bug report (bug ID 4058551).
> 
> If Sun is committed to providing exactly-reproducible Java floating
> point, is a very visible and acknowledged Java-standard-violating
> library bug going to be fixed before the rarely encountered
> perhaps-never-complained-about doubled rounding on underflow
> situation?

I agree that 1.1 was wrong in this respect.  1.2 does use fdlibm
directly but that, as you point out, isn't always right on x86.  We are
trying to decide what to do for the transcendentals, where respecifying
them is a possibility.  Balancing reproducibility with good performance
(JITs want to inline these things) is tough.  Suggestions very welcome.

> [add a new data type to access extended precision where available]
> 
> > We considered this but didn't think we could do it at the time, or while
> > retaining backward compatibility.  Licensees didn't want people to have to
> > write new code to take advantage of the increased performance.
> 
> Adding a new type would increase backwards compatibility by better
> preserving the semantics of existing code.  As discussed in previous
> messages, Java's default strict semantics could be somewhat loosened
> (allowing extended exponent range for anonymous values) in a way that
> improves performance and largely retains reproducibility.

This was considered but was lobbied against by licensees and was not
something we had time to do for 1.2.

> In general, some Java features have not been backwards compatible.
> Obviously, API's added in Java 1.n are not available in Java 1.(n-1).
> However, the situation is more fundamental than that.  For example, a
> Java 1.0 VM cannot provide reflection even if given the classes.zip
> file from Java 1.1.  Reflection (and some other Java 1.0 -> Java 1.1
> changes) require JVM changes.  Therefore, any code that uses such
> features is not backwards compatible.  Allowing double rounding on
> underflow is likely to be much less visible a change than these other
> incompatibilities.

Agreed, and therein lies some of the problem.  API changes are very
visible and thus very easy to understand.  That's not so of more subtle
changes like introduction of double rounding.

> -Joe Darcy
> darcyacs.berkeley.edu

-- Tim