Comments on Sun's Proposal for Extension of Java Floating Point in JDK 1.2

Tue Aug 11 11:35:54 PDT 1998

Hi Samuel,

Thanks for taking the time to write up such extensive comments.  I'll
respond to some of your particular points below.  You've definitely got
the main point, though, that we are trying to successfully walk a
tightrope with this stuff, and that doing anything too rash would kill
the proposal until the next major release of the Java platform.  We think
this is hard to achieve, and are glad to get help from the community
in getting it right.

It's also true that we are not considering this a one-shot deal, but
rather a first step to try to respond to some historic issues in the
Java specs that made them especially difficult to implement efficiently
on Intel and precluded use of features on other architectures as well.

To audit our process: did you not see this proposal at Apple when it
was in licensee review?  It would have been nice to get comments like
these earlier!

> To: javasoft-spec-commentsaEng
> Subject: Comments on Sun's Proposal for Extension of Java Floating Point in JDK 1.2
> Cc: numeric-interestavalidgh.validgh.com
> Date: Mon, 10 Aug 1998 09:44:52 -0700
> From: Samuel Figueroa <figueroaaapple.com>
> 
> Here are my personal comments on Sun's Proposal for Extension of Java  
> Floating Point in JDK 1.2 (see http://java.sun.com/feedback/fp.html).
> 
> Executive summary:
> Sun's desire to improve the floating-point aspects of the Java language, as
> evidenced by the recently announced proposal, is welcome.  Sun should be
> applauded for not yielding to the temptation of proposing rash, radical changes
> to the Java language, since this could invite strong disagreement among Java
> users and licensees, possibly resulting in no changes in the near future - a
> few well thought out improvements are much better than no improvements at all.
> However, some small changes would significantly strengthen Sun's current
> proposal.  Specifically, floating-point arithmetic in widefp methods should be
> more predictable (to reduce the "write once debug everywhere" syndrome), and
> implementors should be allowed to take advantage of the fused multiply-add
> instruction that many processors have - potential performance improvement
> should not be confined to Intel-style processors only.  These changes are
> sketched below in outline-like form (and summarized in the conclusion), along
> with some suggestions for additional possible improvements that will hopefully
> be taken into consideration.

In a nutshell, the lack of predictability in the current proposal was
driven by licensees who wanted performance, and whose compiler writers
wanted to be able to do whatever they wanted in the name of
performance.  We were able to justify the lack of predictability when
doing widefp floating point by the fact that we were able to retain
totally predictable strict floating point.  Although this is certainly
arguable, it seems like the world fragmented into three camps: those
who just didn't care (like graphics types), those who really wanted
bit-for-bit, cross-platform predictability and were willing to pay for
it, and those who were doing real numeric programming and were used to
dealing with less-than-predictability.

As for fused multiply-add, our reading of the proposal and our intent
with it is to allow the PPC to use its fused multiply-add in implementing
widefp.  If that proves to be false for some reason we haven't understood,
we would want to fix that.

> A guess as to what might have been the goals of Java's designers in the area of
>    floating-point arithmetic
> 
>    - for the original Java spec:
>      a) bit-for-bit identical results on all conforming implementations
>      b) make floating-point arithmetic available in the most straightforward
>         way possible on the grounds that the simplest semantics best serves the
>         naive user
>      c) don't complicate things with all the frills of the IEEE Standard, which
>         are probably of use only to experts
>      d) oh, and by the way, this can be implemented very efficiently on SPARC
>         processors, which, after all, is a model implementation of the IEEE
>         Standard

All seems pretty true.  To the extent (d) is true, it's not as
nepotistic as it might appear in retrospect.  At the time this was
decided, Java was Oak and the widely cross-platform Internet play that
led to Java hadn't been conceived of yet.  Oak only ran on SPARC and
using IEEE 754 was the natural choice there.  New processors were
mostly implementing IEEE 754.

>    - reasoning for the current proposal seems to be along the line of:
>      not all processors have floating-point engines like that of the SPARC;
>      let's see if we can loosen the semantics a little bit so that in
>      particular, JVMs running on Intel processors can perform reasonably well

Yup.  While not completely renegging on backward compatibility and bit-for-bit
predictability where that's useful.

> What should have been the goal of the Java spec in the area of floating-point
>    arithmetic?
> 
>    - in retrospect, given the wide popularity of Java and the desire to use it
>      for such a wide variety of applications, the goal should have been to
>      make it easier for naive users to write numerical code that is
>      satifactorily robust, while not adversely impacting performance unduly,
>      or throwing unnecessary road blocks in numerical experts' paths;
>      this means:
> 
>      a) allow appropriate use of wider precision to protect naive users from
>         their own mistakes, or at least so as to allow implementations to
>         provide greater accuracy
>      b) give implementors flexibility for the sake of performance on a wide
>         variety of processors - not just SPARC and Intel
>      c) provide at least the essential "expert" features; these features don't
>         have to be easily accessible if controlling language complexity is a
>         consideration ("expert" features include:
>         - controlling expression evaluation mode, possibly on an operation-by-
>           operation basis
>         - manipulating the rounding mode
>         - accessing the sticky status flags
>         - floating-point trap handling
>         - enabling/disabling features such as fused multiply-add or abrupt
>           underflow ("flush to zero")
>         - enabling/disabling double rounding and extra range on double
>           extended based processors
>         - determining whether certain features are available, such as support
>           for precision wider than double, fused multiply-add, abrupt underflow,
>           and trap handling)

A lot of this stuff might still go into later revisions if we can
figure out how -- we'll get to this below.

>      d) make floating-point semantics sufficiently predictable so that error
>         analysis at least becomes tractable
>      e) as a concession to those [marketing folks?] who feel they absolutely
>         need this, make getting bit-for-bit identical results across diverse
>         implementations achievable as a secondary consideration
> 
> To what extent do the Java spec and current proposal achieve the ideal goals?
> 
>    - in theory, floating-point semantics can be predictable, and bit-for-bit
>      identical results are possible, though in practice this is not currently
>      the case
> 
>    - the current proposal allows some flexibility (at the expense of
>      predictability) so that JVMs running on Intel processors can have better
>      performance; performance on virtually all other processors is unchanged
>      with this proposal

But at least predictability is traded off explicitly in favor of performance.
Some earlier proposals attempted to improve performance at the same time as
slightly lessening predictability.  We thought that led to the worst of the
worlds -- not much performance gain at the cost of added complexity of non-
predictability.  Instead we tried to provide a clearer choice, and where
the choice was performance, to let people go farther than we might have other-
wise been able to do.

> How both the Java spec and current proposal fall short of the ideal goals
> 
>    - the current proposal falls short on the first four of the five goals
>      above:
> 
>      a) it doesn't do enough to protect naive users from their mistakes, since
>         implementations are not required to evaluate expressions using wider
>         precision, nor even be consistent
>      b) it misses potential performance improvements on processors other than
>         Intel and SPARC

We think that PPC should be covered, and if not, will want to fix that.

>      c) it doesn't provide access to "expert" features

Discussed elsewhere.

>      d) floating-point semantics are no longer predictable (except in strictfp
>         mode, in which case performance may not be acceptable)
> 
>    - the original Java spec did meet the fourth goal (floating-point semantics
>      was predictable), but at the expense of very significant detrimental
>      performance impact on Intel processors and potentially less protection
>      for naive users (especially in the case of not using robust formulas
>      that produce acceptable results, even when intermediate results are not
>      computed to high accuracy)
> 
> How could Java be changed to better meet the ideal goals?
> 
>    - What specific modifications to the current proposal would give the greatest
>      "bang for the buck?"
> 
>      a) permit wider precision, but in a way that is predictable
>         - in all explicitly widefp code, all expressions should be required to
>           always be evaluated in double or always in double extended, depending
>           on which of these two generally leads to better performance; if the
>           underlying arithmetic engine does not support double extended,
>           expressions would always be evaluated in double

Unfortunately, doing this is constraining to implementors on Intel.
For instance, compiler writers need to spill to memory, and don't want
to spill all 80 bits of intermediate results.  My assumption has always
been that FSTP to 80 bits in memory is very slow, but we should know
that before we write this off.  We've been pushed very hard in favor of
performance at the cost of predictability.  Note also that making
widefp code predictable in this way seems to throw off other things, like
the ability to use PPC's fused multiply-add.

>         - results should always be narrowed to appropriate format on assignment
>           in order to reduce surprising behavior - unfortunately, this excludes
>           having variables with wider precision, but makes the language simpler
>           (variables with wider precision need to be called something other
>           than double - it's too confusing to have double sometimes mean double,
>           and sometimes not mean double)

This was approximately the first proposal that went to licensees: that
only intermediate results could be wide, and rounding would happen on
assignment.  In JVM terms, the operand stacks could be wide, but not
local variables.  Licensees told us (and we agreed) that the trouble
with stopping here is that it doesn't do much for performance and
penalizes people from writing good code.  So for instance you can't
store an intermediate value across iterations of a loop without
rounding and thus losing both precision and performance.  

You also have to round parameters passed to methods, and retain that
rounding even if you inline (and make the rounding pointless).  In
effect you penalize people writing factored code in anticipation of
inlining, and induce people to write big methods with big expressions
in the name of performance.  We think that would be very unfortunate,
especially when technologies like HotSpot are coming out that would
otherwise make the performance of well-factored code just as good as
the performance of non-factored code.

>         - narrowing should also be required when casting, when calling strictfp
>           methods, and when returning to strictfp methods (i.e., if the
>           underlying arithmetic engine supports double extended, parameters in
>           widefp methods should be passed in double extended format and fp
>           return values of widefp methods should be in double extended format;
>           this would make behavior of functional notation identical to infix
>           notation, so that "add(a/b, c) * d" would give the same result as
>           (a/b + c) * d, assuming add() simply adds two numbers together)

I'm not sure I understand this.  Rounding is required when you call a strictfp
method from a widefp method, and when a widefp method returns a value to a
strictfp method.

>         - narrowing should not be allowed in any other cases

The argument was register spilling.

>         - implicitly widefp methods should all be treated as either explicitly
>           strictfp methods or as explicitly widefp methods, at the
>           implementors' option, never sometimes one way and other times another
>           way

The (proposed) JVM Spec would always treat implicitly widefp methods as
widefp, although the current proposal permits a lot of rounding in a
widefp method.  The model should never be that widefp sometimes means
strictfp and sometimes widefp.  What widefp means is chosen on startup.
However, it is the case that sometimes widefp does rounding for its own
purposes that is like strictfp.

Are you thinking that there is something stronger going on than widefp
"rounding at its option", or are you just saying that in a widefp
method everything should always be predictably wide?

>         - a globally-accessible constant should be made available to indicate
>           whether implicitly widefp methods are always treated as explcitly
>           strictfp or widefp methods

You can use a method in java.lang.reflection.Modifier to get at whether
a method was declared (or is implicitly) strictfp or widefp. 

>         - another globally-accessible constant should be made available to
>           indicate whether expressions within widefp methods are evaluated to
>           double or double extended precision

I'm not sure how this relates to your other proposals.  I think you've
wanted to say that widefp should always be whatever that JVM
implementation considers to be wide, in which case the behavior of a
method gives the behavior of all its expressions.  

Alternatively, you might be getting at the question of whether the JVM
is actually using extended precision at all, or just implementing
everything as strict.  There is a provision for determining that in
java.lang.{Float,Double} where new constants have been added giving the
parameters of the widefp float-extended and double-extended formats.
Those parameters will be defaults (which are the parameters of the
strict formats) if extended formats are not being used.

>      b) permit, but not require, fused multiply-add in widefp mode

Like I said, we think that the current proposal covers fused multiply-add
in widefp.  However, the fact that it works depends on the fact that wide
doesn't always mean wide.  Essentially you'd declare your extended precision
to be whatever the width of your fused m-a result and only use extended
precision there.

>         - interpreted code would probably never use it, whereas compiled code
>           would probably use it whenever possible
>         - predictability is probably not as critical a consideration, since no
>           new formats are involved - values in fp registers do not have more
>           precision than values stored in memory

I'm not sure about this.  Either we should achieve complete predictability or
we should not pretend.  Making the non-predictable situations more subtle
is a step backwards.  Feel free to educate us if you don't think that's so.

>         - however, whenever necessary, control over when fused multiply-add
>           must be used and when it must not be used could be achieved by
>           invoking new methods to be added to java.lang.math, instead of using
>           infix notation for arithmetic expressions

In the current round of changes, java.lang.math was strictly off the
table, but we also think that something has to be done about that.  The
current spec also appears to be hard to implement both correctly and
efficiently.  It is pretty likely that we will consider either loosening
java.lang.math or introducing a new class that is more loosely defined.

>         - all implementations should be required to provide a fused multiply-
>           add method in java.lang.math for each of the fp data types; these
>           methods should be required to obey the standard semantics for fused
>           multiply-add (exact product, rounding only after addition), even if
>           they must be implemented in software
>         - a type-specific constant should be made available to indicate whether
>           there is hardware support for fused multiply-add, i.e., whether the
>           fused multiply-add methods in java.lang.math are implemented purely
>           in software or not; this would allow the language processor to choose
>           between two different algorithms - one that exploits fused muliply-
>           add, and one that doesn't - based on which algorithm is faster
>         - if performance is important, a JIT or traditional compiler would be
>           used, either of which would be capable of exploiting fused multiply-
>           add, if available
>         - if an interpreter is used, this implies performance is not important,
>           so if a bad choice is made with respect to which algorithm to use
>           (e.g., the algorithm that exploits fused multiply-add is chosen, even
>           though the interpreter isn't able to actually make use of the fused
>           multiply-add instruction), it doesn't matter - the right result will
>           be obtained anyway, because places that actually require the fused
>           multiply-add operation will invoke the fused multiply-add method
> 
>      c) add methods to java.lang.math that would allow one to write fp
>         arithmetic expressions in functional notation
>         - the semantics of these methods would be identical to the current Java
>           semantics for fp operations
>         - fused multiply-add should be one of these methods
>         - these methods can be used to avoid double rounding at all costs in
>           (implicitly or explicitly) widefp methods, and to either force the
>           use of fused multiply-add, or prevent its use
>         - these methods would likely be used only by "experts," and even then
>           only rarely

The above seem like reasonable points in the context of the java.lang.math
changes.  If you want to get them recorded over here, you could submit a feature
request using the form hanging off http://java.sun.com/mail.  We'll keep this
email, but those feature requests get stashed in a database.

>    - What else would it take to make Java a more ideal language for numerical
>      computing?
> 
>      a) allow slightly more flexibility in strictfp mode to lessen performance
>         impact
>         - allow double rounding in strictfp mode when result is within double's
>           denormal range (i.e., on Intel processors, it should be sufficient to
>           set rounding precision to double, then store-load after every fp
>           operation)
>         - bit-for-bit identical results could still be very nearly always
>           achievable, and performance could at least double in some cases,
>           though performance might not equal that of a widefp method

We stewed over this but ended up thinking that we had to stay backward compati-
ble when strict.  We didn't think this small extension would satisfy performance
needs (given the store-load costs) but would break our promises and still
wouldn't give bit-for-bit.

>      b) add a data type so wider precision can be referred to by name
>         - this can be similar to Real Java's doubleN or C's long double data
>           type; i.e., double extended precision if the underlying arithmetic
>           engine supports it and performance is close to double arithmetic,
>           double otherwise
>         - this would be the format used when evaluating explicitly widefp code
>         - one possible name for this data type could be "widefp double"
>         - this will be especially important in the future as wider precision
>           becomes more commonly supported in hardware

We considered this but didn't think we could do it at the time, or while
retaining backward compatibility.  Licensees didn't want people to have to
write new code to take advantage of the increased performance.

>      c) provide more complete support for the IEEE Standard by making available:
>         - signaling NaNs
>         - more than one quiet NaN
>         - float version of square root
>         - different rounding modes
>         - (sticky) status flags
>         - maybe even trap handling

These are (all?) very possible in the next round of changes.

>      d) allow implementations to provide greater accuracy, e.g., for elementary
>         functions
>         - give programmers a choice between fast, accurate, and standard
>           (i.e., current) versions of the elementary functions

Do you mean like the '+' operator?  I'm not sure how to do that.  However, this
might be how we go about the math library, e.g. keep java.lang.math and add
java.lang.fastmath.

>      e) for the sake of completeness, one might consider adding
>         a nonstandard mode that would permit, e.g., abrupt underflow ("flush to
>         zero"), some compiler optimizations that are currently forbidden, and
>         unpredictable (but possibly more efficient) expression evaluation;
>         however, this kind of feature probably doesn't fit very well with the
>         rest of the language, although note that the spirit of the current
>         proposal is kind of along these lines - widefp mode is very loosely
>         specified, unlike the rest of the Java language

Well, the Java language and libraries try not to specify behavior that
can't be guaranteed, e.g. thread scheduling isn't specified precisely
because forcing implementors to implement new thread packages would
have killed Java server performance.  But these things are costly in
terms of brain-power and have to be very carefully chosen.  So we'd need
to see a case for (e).

> Conclusions and recommendations
> 
>    - Is the current proposal ready to be set in stone?
> 
>      No, because:
>      a) behavior of fp arithmetic is not predictable except in strictfp mode,
>         which exacts a very high performance penalty on some processors; i.e.,
>         the current proposal's widefp mode almost amounts to nonstandard
>         (do-whatever-you-want) arithmetic
>      b) the current proposal does not allow the use of fused multiply-add
>      c) allowing double to really mean double extended is too confusing and
>         complicates the language unnecessarily
>      d) the proposal should not strongly tempt implementors to disregard its
>         requirements (e.g., in strictfp mode) for the sake of marketing
>         advantages, such as greater performance, or to provide an
>         implementation with good performance but with poor usability due to the
>         poor quality of the arithmetic
> 
>    - Can this proposal be strengthened significantly without redesigning the
>      whole language?
> 
>      Yes, a few small changes would improve the proposal significantly:
>      a) fp arithmetic in widefp methods should be more predictable
>      b) fused multiply-add should be allowed in widefp methods
>      c) experts should be able to control when widefp arithmetic and fused
>         multiply-add are used (by marking methods as explicitly widefp or
>         strictfp, as the current proposal allows, and by calling methods in
>         java.lang.math when double rounding or fused multiply-add must be
>         avoided at all costs, or to force fused multiply-add to be used);
>         nonexpert users will rarely need to make use of these features
>      d) maybe allow double rounding in strictfp mode when result is in denormal
>         range, since this can improve performance significantly, and happens
>         very rarely
>      e) potential performance improvement might be slightly less (on Intel
>         processors, but somewhat more on other processors) than what the
>         current proposal allows, but performance improvement would still be
>         very significant, and the quality of Java's floating-point arithmetic
>         would be enhanced substantially over the current proposal

I think I've mostly responded to these above.

>    - What additional modifications should be made in the area of floating-point
>      arithmetic?
> 
>      If Sun does not make this the last time Java's floating-point aspects are
>      improved, then this proposal can be kept modest

It's not.  The intent is that we take a modest first step now, and learn more
about what we can do in the future.

>      Sun should view this proposal as a first step towards making Java a more
>      convenient language for numerical programming, one that:
>      a) has more complete support for the IEEE Standard
>      b) allows implementors to provide greater accuracy (and quite possibly
>         better performance) than is currently allowed, particularly in regards
>         to the math library; also, providing an additional floating-point type
>         similar to C's long double that is at least as wide as double, if not
>         wider, would help in this regards

Although we maybe can't agree on the details yet, I think we agree on the
underlying principles of what we're doing in this proposal.

Thanks again for all your thoughts...feel free to follow up.

-- Tim