Comments on Sun's Proposal for Extension of Java Floating Point in JDK 1.2

Mon Aug 10 09:44:52 PDT 1998

Here are my personal comments on Sun's Proposal for Extension of Java  
Floating Point in JDK 1.2 (see http://java.sun.com/feedback/fp.html).

Executive summary:
Sun's desire to improve the floating-point aspects of the Java language, as
evidenced by the recently announced proposal, is welcome.  Sun should be
applauded for not yielding to the temptation of proposing rash, radical changes
to the Java language, since this could invite strong disagreement among Java
users and licensees, possibly resulting in no changes in the near future - a
few well thought out improvements are much better than no improvements at all.
However, some small changes would significantly strengthen Sun's current
proposal.  Specifically, floating-point arithmetic in widefp methods should be
more predictable (to reduce the "write once debug everywhere" syndrome), and
implementors should be allowed to take advantage of the fused multiply-add
instruction that many processors have - potential performance improvement
should not be confined to Intel-style processors only.  These changes are
sketched below in outline-like form (and summarized in the conclusion), along
with some suggestions for additional possible improvements that will hopefully
be taken into consideration.

A guess as to what might have been the goals of Java's designers in the area of
   floating-point arithmetic

   - for the original Java spec:
     a) bit-for-bit identical results on all conforming implementations
     b) make floating-point arithmetic available in the most straightforward
        way possible on the grounds that the simplest semantics best serves the
        naive user
     c) don't complicate things with all the frills of the IEEE Standard, which
        are probably of use only to experts
     d) oh, and by the way, this can be implemented very efficiently on SPARC
        processors, which, after all, is a model implementation of the IEEE
        Standard

   - reasoning for the current proposal seems to be along the line of:
     not all processors have floating-point engines like that of the SPARC;
     let's see if we can loosen the semantics a little bit so that in
     particular, JVMs running on Intel processors can perform reasonably well

What should have been the goal of the Java spec in the area of floating-point
   arithmetic?

   - in retrospect, given the wide popularity of Java and the desire to use it
     for such a wide variety of applications, the goal should have been to
     make it easier for naive users to write numerical code that is
     satifactorily robust, while not adversely impacting performance unduly,
     or throwing unnecessary road blocks in numerical experts' paths;
     this means:

     a) allow appropriate use of wider precision to protect naive users from
        their own mistakes, or at least so as to allow implementations to
        provide greater accuracy
     b) give implementors flexibility for the sake of performance on a wide
        variety of processors - not just SPARC and Intel
     c) provide at least the essential "expert" features; these features don't
        have to be easily accessible if controlling language complexity is a
        consideration ("expert" features include:
        - controlling expression evaluation mode, possibly on an operation-by-
          operation basis
        - manipulating the rounding mode
        - accessing the sticky status flags
        - floating-point trap handling
        - enabling/disabling features such as fused multiply-add or abrupt
          underflow ("flush to zero")
        - enabling/disabling double rounding and extra range on double
          extended based processors
        - determining whether certain features are available, such as support
          for precision wider than double, fused multiply-add, abrupt underflow,
          and trap handling)
     d) make floating-point semantics sufficiently predictable so that error
        analysis at least becomes tractable
     e) as a concession to those [marketing folks?] who feel they absolutely
        need this, make getting bit-for-bit identical results across diverse
        implementations achievable as a secondary consideration

To what extent do the Java spec and current proposal achieve the ideal goals?

   - in theory, floating-point semantics can be predictable, and bit-for-bit
     identical results are possible, though in practice this is not currently
     the case

   - the current proposal allows some flexibility (at the expense of
     predictability) so that JVMs running on Intel processors can have better
     performance; performance on virtually all other processors is unchanged
     with this proposal

How both the Java spec and current proposal fall short of the ideal goals

   - the current proposal falls short on the first four of the five goals
     above:

     a) it doesn't do enough to protect naive users from their mistakes, since
        implementations are not required to evaluate expressions using wider
        precision, nor even be consistent
     b) it misses potential performance improvements on processors other than
        Intel and SPARC
     c) it doesn't provide access to "expert" features
     d) floating-point semantics are no longer predictable (except in strictfp
        mode, in which case performance may not be acceptable)

   - the original Java spec did meet the fourth goal (floating-point semantics
     was predictable), but at the expense of very significant detrimental
     performance impact on Intel processors and potentially less protection
     for naive users (especially in the case of not using robust formulas
     that produce acceptable results, even when intermediate results are not
     computed to high accuracy)

How could Java be changed to better meet the ideal goals?

   - What specific modifications to the current proposal would give the greatest
     "bang for the buck?"

     a) permit wider precision, but in a way that is predictable
        - in all explicitly widefp code, all expressions should be required to
          always be evaluated in double or always in double extended, depending
          on which of these two generally leads to better performance; if the
          underlying arithmetic engine does not support double extended,
          expressions would always be evaluated in double
        - results should always be narrowed to appropriate format on assignment
          in order to reduce surprising behavior - unfortunately, this excludes
          having variables with wider precision, but makes the language simpler
          (variables with wider precision need to be called something other
          than double - it's too confusing to have double sometimes mean double,
          and sometimes not mean double)
        - narrowing should also be required when casting, when calling strictfp
          methods, and when returning to strictfp methods (i.e., if the
          underlying arithmetic engine supports double extended, parameters in
          widefp methods should be passed in double extended format and fp
          return values of widefp methods should be in double extended format;
          this would make behavior of functional notation identical to infix
          notation, so that "add(a/b, c) * d" would give the same result as
          (a/b + c) * d, assuming add() simply adds two numbers together)
        - narrowing should not be allowed in any other cases
        - implicitly widefp methods should all be treated as either explicitly
          strictfp methods or as explicitly widefp methods, at the
          implementors' option, never sometimes one way and other times another
          way
        - a globally-accessible constant should be made available to indicate
          whether implicitly widefp methods are always treated as explcitly
          strictfp or widefp methods
        - another globally-accessible constant should be made available to
          indicate whether expressions within widefp methods are evaluated to
          double or double extended precision

     b) permit, but not require, fused multiply-add in widefp mode
        - interpreted code would probably never use it, whereas compiled code
          would probably use it whenever possible
        - predictability is probably not as critical a consideration, since no
          new formats are involved - values in fp registers do not have more
          precision than values stored in memory
        - however, whenever necessary, control over when fused multiply-add
          must be used and when it must not be used could be achieved by
          invoking new methods to be added to java.lang.math, instead of using
          infix notation for arithmetic expressions
        - all implementations should be required to provide a fused multiply-
          add method in java.lang.math for each of the fp data types; these
          methods should be required to obey the standard semantics for fused
          multiply-add (exact product, rounding only after addition), even if
          they must be implemented in software
        - a type-specific constant should be made available to indicate whether
          there is hardware support for fused multiply-add, i.e., whether the
          fused multiply-add methods in java.lang.math are implemented purely
          in software or not; this would allow the language processor to choose
          between two different algorithms - one that exploits fused muliply-
          add, and one that doesn't - based on which algorithm is faster
        - if performance is important, a JIT or traditional compiler would be
          used, either of which would be capable of exploiting fused multiply-
          add, if available
        - if an interpreter is used, this implies performance is not important,
          so if a bad choice is made with respect to which algorithm to use
          (e.g., the algorithm that exploits fused multiply-add is chosen, even
          though the interpreter isn't able to actually make use of the fused
          multiply-add instruction), it doesn't matter - the right result will
          be obtained anyway, because places that actually require the fused
          multiply-add operation will invoke the fused multiply-add method

     c) add methods to java.lang.math that would allow one to write fp
        arithmetic expressions in functional notation
        - the semantics of these methods would be identical to the current Java
          semantics for fp operations
        - fused multiply-add should be one of these methods
        - these methods can be used to avoid double rounding at all costs in
          (implicitly or explicitly) widefp methods, and to either force the
          use of fused multiply-add, or prevent its use
        - these methods would likely be used only by "experts," and even then
          only rarely

   - What else would it take to make Java a more ideal language for numerical
     computing?

     a) allow slightly more flexibility in strictfp mode to lessen performance
        impact
        - allow double rounding in strictfp mode when result is within double's
          denormal range (i.e., on Intel processors, it should be sufficient to
          set rounding precision to double, then store-load after every fp
          operation)
        - bit-for-bit identical results could still be very nearly always
          achievable, and performance could at least double in some cases,
          though performance might not equal that of a widefp method

     b) add a data type so wider precision can be referred to by name
        - this can be similar to Real Java's doubleN or C's long double data
          type; i.e., double extended precision if the underlying arithmetic
          engine supports it and performance is close to double arithmetic,
          double otherwise
        - this would be the format used when evaluating explicitly widefp code
        - one possible name for this data type could be "widefp double"
        - this will be especially important in the future as wider precision
          becomes more commonly supported in hardware

     c) provide more complete support for the IEEE Standard by making available:
        - signaling NaNs
        - more than one quiet NaN
        - float version of square root
        - different rounding modes
        - (sticky) status flags
        - maybe even trap handling

     d) allow implementations to provide greater accuracy, e.g., for elementary
        functions
        - give programmers a choice between fast, accurate, and standard
          (i.e., current) versions of the elementary functions

     e) for the sake of completeness, one might consider adding
        a nonstandard mode that would permit, e.g., abrupt underflow ("flush to
        zero"), some compiler optimizations that are currently forbidden, and
        unpredictable (but possibly more efficient) expression evaluation;
        however, this kind of feature probably doesn't fit very well with the
        rest of the language, although note that the spirit of the current
        proposal is kind of along these lines - widefp mode is very loosely
        specified, unlike the rest of the Java language

Conclusions and recommendations

   - Is the current proposal ready to be set in stone?

     No, because:
     a) behavior of fp arithmetic is not predictable except in strictfp mode,
        which exacts a very high performance penalty on some processors; i.e.,
        the current proposal's widefp mode almost amounts to nonstandard
        (do-whatever-you-want) arithmetic
     b) the current proposal does not allow the use of fused multiply-add
     c) allowing double to really mean double extended is too confusing and
        complicates the language unnecessarily
     d) the proposal should not strongly tempt implementors to disregard its
        requirements (e.g., in strictfp mode) for the sake of marketing
        advantages, such as greater performance, or to provide an
        implementation with good performance but with poor usability due to the
        poor quality of the arithmetic

   - Can this proposal be strengthened significantly without redesigning the
     whole language?

     Yes, a few small changes would improve the proposal significantly:
     a) fp arithmetic in widefp methods should be more predictable
     b) fused multiply-add should be allowed in widefp methods
     c) experts should be able to control when widefp arithmetic and fused
        multiply-add are used (by marking methods as explicitly widefp or
        strictfp, as the current proposal allows, and by calling methods in
        java.lang.math when double rounding or fused multiply-add must be
        avoided at all costs, or to force fused multiply-add to be used);
        nonexpert users will rarely need to make use of these features
     d) maybe allow double rounding in strictfp mode when result is in denormal
        range, since this can improve performance significantly, and happens
        very rarely
     e) potential performance improvement might be slightly less (on Intel
        processors, but somewhat more on other processors) than what the
        current proposal allows, but performance improvement would still be
        very significant, and the quality of Java's floating-point arithmetic
        would be enhanced substantially over the current proposal

   - What additional modifications should be made in the area of floating-point
     arithmetic?

     If Sun does not make this the last time Java's floating-point aspects are
     improved, then this proposal can be kept modest

     Sun should view this proposal as a first step towards making Java a more
     convenient language for numerical programming, one that:
     a) has more complete support for the IEEE Standard
     b) allows implementors to provide greater accuracy (and quite possibly
        better performance) than is currently allowed, particularly in regards
        to the math library; also, providing an additional floating-point type
        similar to C's long double that is at least as wide as double, if not
        wider, would help in this regards