Comments on Sun's Proposal for Extension of Java Floating Point in JDK 1.2

Wed Aug 26 10:39:41 PDT 1998

** Reply to note from Tim Lindholm <Timothy.LindholmaEng.Sun.COM> Tue,
11 Aug 1998 11:35:54 -0700 (PDT)
Thanks for your openness in your response. Your insights into the
purpose
of the current proposal and the history of Java are particularly
welcome.

It looks like the main focus of the current proposal is improving
performance, rather than making better use of many processors' potential
for accuracy, as I had previously assumed. As far as I can tell, these
two
goals tend to be conflicting. So this proposal seems to be the wrong
battle
to be fighting for those of us who would like to see improved quality in
the area of floating-point arithmetic, provided this proposal does not
preclude improving Java in this area due to technical (or political)
issues (and I do believe there are some issues that need to be
addressed).

I am not against focusing on floating-point performance. It is certainly
a
legitimate concern for those in the graphics area. I'm sure marketing
people are also very concerned about this area, since they need to be
able
to point to big numbers (in benchmark results) when bragging about their
Java implementations - big numbers mean more dollars to many companies.
Apple is certainly no less guilty than any other company in this regard
(though in this message I am not officially speaking for Apple).

Although I do object to some parts of the current proposal, my
assumption
is that some proposal similar to the current one will be officially
incorporated into the Java language, no matter what we do.  So we can
either
jump up and down all we want on the sidelines while we watch Java's
floating-point semantics become less ideal than we might have hoped, or
we
can do what we can to prevent Java from becoming worse than necessary.

Before getting to my objections, I would like to put the comments I sent
you in my previous message into perspective, though many of those
comments
are probably no longer very relevant, now that I have a better idea of
what
the goals of the current proposal are. At the end of this message, I
make
some concluding remarks and summarize what I think should be changed in
the
current proposal, given the information you have been gracious to
provide,
so you may want to skip to that section.

Given Java's original design, my idea is that there should be a
"strictfp"
mode (for bit-for-bit compatibility, which I perceive to be important
due
mainly to marketing reasons), what I will call a "WIDEFP" mode (whose
main
purpose is to provide improved accuracy at the expense of bit-for-bit
compatibility), and now, with this proposal, a performance-oriented
widefp mode. (From now on, I'll call the current proposal's widefp mode
"nativefp mode" because I think it makes things clearer.  I'll call my
idea
of what a widefp mode should be like "WIDEFP" in order to reduce
confusion
with the current proposal's use of the word "widefp.")  Both strictfp
and
WIDEFP modes should be specified fairly rigorously (so as to make the
latter mode very nearly predictable, but not bit-for-bit identical
across
different implementations), while nativefp mode should be less
constrained.
(Incidentally, there is already a precedent for nativefp mode - it is
reminiscent of native methods in that you can do pretty much what you
want.) So with this perspective in mind, let me respond to some of the
issues you raised.

> As for fused multiply-add, our reading of the proposal and our intent
> with it is to allow the PPC to use its fused multiply-add in implementing
> widefp. If that proves to be false for some reason we haven't understood,
> we would want to fix that.

I think the current proposal is not clear about this, although, given
how
loosely "nativefp" mode is specified, probably no one would complain if
fused multiply-add were used in this mode (except for lawyer types and
people trying to do stringent testing). I'll comment on this later.

> >    - reasoning for the current proposal seems to be along the line of:
> >      not all processors have floating-point engines like that of the SPARC;
> >      let's see if we can loosen the semantics a little bit so that in
> >      particular, JVMs running on Intel processors can perform reasonably well
> Yup. While not completely renegging on backward compatibility and
> bit-for-bit predictability where that's useful.

Making "unmarked" methods equivalent to "nativefp" methods is not
strictly
backward compatible.  More on this later.

> >    - What specific modifications to the current proposal would give the greatest
> >      "bang for the buck?"
> >      a) permit wider precision, but in a way that is predictable
> >         - in all explicitly widefp code, all expressions should be required to
> >           always be evaluated in double or always in double extended, depending
> >           on which of these two generally leads to better performance; if the
> >           underlying arithmetic engine does not support double extended,
> >           expressions would always be evaluated in double
> Unfortunately, doing this is constraining to implementors on Intel. For
> instance, compiler writers need to spill to memory, and don't want to
> spill all 80 bits of intermediate results. My assumption has always been
> that FSTP to 80 bits in memory is very slow, but we should know that
> before we write this off. We've been pushed very hard in favor of
> performance at the cost of predictability. Note also that making widefp
> code predictable in this way seems to throw off other things, like the
> ability to use PPC's fused multiply-add.

I did confirm that storing to an 80-bit memory location takes 3 clock
cycles, as opposed to 32- or 64-bit store's 2 clock cycles, even on a
Pentium II. Loading 80 bits takes 3 clock cycles on a Pentium II instead
of
1 for 32- and 64-bit loads.  Basic arithmetic cannot be done using an
80-bit
operand residing in memory, as can be done with 32- and 64-bit memory
operands, so this increases the performance impact of having to do
80-bit
spills to memory.  However, maybe even so, it may not be a bad idea to
forbid arbitrary narrowing of intermediate expressions - see my
concluding
remarks.

I don't see the unpredictability of whether fused multiply-add is used
as a
big deal, as long as it's convenient to disable its use, and there's a
way
to guarantee its use.  If others think otherwise, I'd like to know why,
just
for my own increased understanding.

> >         - results should always be narrowed to appropriate format on assignment
> >           in order to reduce surprising behavior - unfortunately, this excludes
> >           having variables with wider precision, but makes the language simpler
> >           (variables with wider precision need to be called something other
> >           than double - it's too confusing to have double sometimes mean double,
> >           and sometimes not mean double)
> This was approximately the first proposal that went to licensees: that
> only intermediate results could be wide, and rounding would happen on
> assignment. In JVM terms, the operand stacks could be wide, but not local
> variables. Licensees told us (and we agreed) that the trouble with
> stopping here is that it doesn't do much for performance and penalizes
> people from writing good code. So for instance you can't store an
> intermediate value across iterations of a loop without rounding and thus
> losing both precision and performance. 

In the context of an ideal WIDEFP mode and a distinct nativefp mode,
there
would be a type potentially wider than double that could allow greater
accuracy to be retained, even across assignment.  In nativefp mode,
whatever gives better performance is what's done; greater accuracy is
possible, but "accidental." So in this context, if your concern were
greater accuracy, you would use WIDEFP mode and wider variable(s), once
they become available in Java; faster performance in this mode, though
possible, is "accidental." (Though I think "WIDEFP" should not be
synonymous with "slowfp."  And by the way, note that with my WIDEFP
mode,
performance would worsen slightly on processors like SPARC, but not
nearly
to the extent that Intel's performance is impacted by Java's current
semantics.  The performance impact of my WIDEFP mode on processors like
SPARC could practically be eliminated by adding an instruction that
loads a
32-bit float into a double precision register, and another that stores
the
value in a double precision register to a memory location in 32-bit
float
format.)  In other words, I think requiring (or allowing) local
variables
to have precision beyond that specified in the source code is the wrong
solution to the problem of needing a way to retain the extra precision
of
intermediate expressions.

> You also have to round parameters passed to methods, and retain that
> rounding even if you inline (and make the rounding pointless). In effect
> you penalize people writing factored code in anticipation of inlining,
> and induce people to write big methods with big expressions in the name
> of performance. We think that would be very unfortunate, especially when
> technologies like HotSpot are coming out that would otherwise make the
> performance of well-factored code just as good as the performance of
> non-factored code.
> >         - narrowing should also be required when casting, when calling strictfp
> >           methods, and when returning to strictfp methods (i.e., if the
> >           underlying arithmetic engine supports double extended, parameters in
> >           widefp methods should be passed in double extended format and fp
> >           return values of widefp methods should be in double extended format;
> >           this would make behavior of functional notation identical to infix
> >           notation, so that "add(a/b, c) * d" would give the same result as
> >           (a/b + c) * d, assuming add() simply adds two numbers together)
> I'm not sure I understand this. Rounding is required when you call a
> strictfp method from a widefp method, and when a widefp method returns a
> value to a strictfp method.

But in my WIDEFP mode, rounding would not be permitted when calling
WIDEFP
methods, or when returning from a WIDEFP method to another WIDEFP
method.
So factored code is not penalized with my scheme - associating an
expression with a formal parameter would NOT be considered assignment
for
the purposes of the rules for rounding fp values.

> >         - implicitly widefp methods should all be treated as either explicitly
> >           strictfp methods or as explicitly widefp methods, at the
> >           implementors' option, never sometimes one way and other times another
> >           way
> The (proposed) JVM Spec would always treat implicitly widefp methods as
> widefp, although the current proposal permits a lot of rounding in a
> widefp method. The model should never be that widefp sometimes means
> strictfp and sometimes widefp. What widefp means is chosen on startup.
> However, it is the case that sometimes widefp does rounding for its own
> purposes that is like strictfp.
> Are you thinking that there is something stronger going on than widefp
> "rounding at its option", or are you just saying that in a widefp method
> everything should always be predictably wide?

My idea is that all methods that are unmarked should be treated as
either
WIDEFP methods or as strictfp methods. (I guess treating all unmarked
methods as nativefp could be another possibility.) But if backward
compatibility is very important, then unmarked methods should really be
treated as explicitly strictfp methods. If people don't want to rewrite
any
code and still have good performance regardless of the underlying JVM,
it
should be relatively easy to come up with some tool that changes all
unmarked methods into nativefp methods. However, according to a
subsequent
e-mail message, it looks like you have reason to believe treating all
unmarked methods as nativefp methods will not be a problem. In that
case, I
would prefer making the treatment of unmarked methods be implementation
dependent - either always nativefp or always strictfp. If my WIDEFP
mode,
or something similar, is ever introduced later, then it will be easier
to
allow unmarked methods to be treated as WIDEFP methods, which is my
personal preference. I am not very comfortable with unmarked methods
being
required to be treated as nativefp methods, because this could cause too
many users to puzzle over seemingly surprising behavior (especially if
fused multiply-add is allowed in nativefp mode). So if treatment of
unmarked methods were implementation defined, for now, people obsessed
with
performance will most likely get it without having to change any of
their
code (because they'd use implementations that treat unmarked methods as
nativefp, or else use some tool that would cause this to happen), and
then
perhaps in the future, this part of the Java spec could be tightened (so
it would no longer be implementation dependent) in favor of the average
user once people to whom it matters have a chance to mark their methods
nativefp where appropriate.

> >         - a globally-accessible constant should be made available to indicate
> >           whether implicitly widefp methods are always treated as explicitly
> >           strictfp or widefp methods
> You can use a method in java.lang.reflection.Modifier to get at whether a
> method was declared (or is implicitly) strictfp or widefp.

The goal here is that it should be possible to somehow figure out how a
method is treated, though I don't expect many people will need such a
feature.  So if this could be done with java.lang.reflection.Modifier,
that's fine.

> >         - another globally-accessible constant should be made available to
> >           indicate whether expressions within widefp methods are evaluated to
> >           double or double extended precision
> I'm not sure how this relates to your other proposals. I think you've
> wanted to say that widefp should always be whatever that JVM
> implementation considers to be wide, in which case the behavior of a
> method gives the behavior of all its expressions. 
> Alternatively, you might be getting at the question of whether the JVM is
> actually using extended precision at all, or just implementing everything
> as strict. There is a provision for determining that in
> java.lang.{Float,Double} where new constants have been added giving the
> parameters of the widefp float-extended and double-extended formats.
> Those parameters will be defaults (which are the parameters of the strict
> formats) if extended formats are not being used.

In my WIDEFP mode, all expressions would be evaluated to double or
double
extended, depending on what a particular implementation supports. In my
scheme, comparing MAX_EXPONENT with WIDEFP_MAX_EXPONENT would allow one
to
determine whether double extended is supported. (Incidentally, adding
WIDEFP_MAX_EXPONENT to Java is more appropriate in the context of my
WIDEFP
mode than in the current proposal; WIDEFP_MAX_EXPONENT is not very
meaningful in the current proposal - see below.)

> >      b) permit, but not require, fused multiply-add in widefp mode
> Like I said, we think that the current proposal covers fused multiply-add
> in widefp. However, the fact that it works depends on the fact that wide
> doesn't always mean wide. Essentially you'd declare your extended
> precision to be whatever the width of your fused m-a result and only use
> extended precision there.

I think the proposal needs to be clearer about whether fused
multiply-add
is allowed. Perhaps it would help if the proposal said something along
the
lines that the extended formats don't actually have to be realized in
hardware, but are only a means for explaining conceptually how
arithmetic
is performed.

At this point, let me express my strong objection over including the
WIDEFP_* constants in the current proposal. Here's why: Given the
current
proposal, in order to justify the use of fused multiply-add in nativefp
methods, you'd have to pretend that you have a double extended format
with
at least a 15-bit exponent (the minimum size for a double extended
format),
and a significand with at least 158 bits - it's possible to prove that
if
the significand had fewer bits than this, the fused multiply-add
operation
would have to round twice in order to conform to the current proposal's
semantics. (Think what would happen if a=1+2^(-52), b=1-2^(-52),
c=2^53+2,
and d = a*b + c.) But it's preposterous to pretend that such a format,
or
anything remotely like it, exists on processors like the PowerPC. This
shows how ridiculous it is to have these constants as part of the
current
proposal.

(I understand that these constants might make it easier for people who
are
trying to test for conformance to the language specs, and that it would
be
very appealing to have such constants in order to have a better idea of
what kind of accuracy you're going to get, but the latter is simply not
possible, given the current proposal's failure to guarantee when, if
ever,
wider precision will be used - certainly there's no way to tell what
precision is likely to be used when all operands are 32-bit floats,
given
the information that would be made available were the current proposal
incorporated into the Java language.)

I think it would be better if the WIDEFP_* constants reflected only
format(s) in which the underlying JVM is capable of storing values.  If
testers (and others) need something to figure out if fused multiply-add
might be used, there should be separate type-dependent constants (with
appropriate names - if something fairly generic is required,
Contracted_Operations might do) to indicate this.

> >         - interpreted code would probably never use it, whereas compiled code
> >           would probably use it whenever possible
> >         - predictability is probably not as critical a consideration, since no
> >           new formats are involved - values in fp registers do not have more
> >           precision than values stored in memory
> I'm not sure about this. Either we should achieve complete predictability
> or we should not pretend. Making the non-predictable situations more
> subtle is a step backwards. Feel free to educate us if you don't think
> that's so.
> >         - however, whenever necessary, control over when fused multiply-add
> >           must be used and when it must not be used could be achieved by
> >           invoking new methods to be added to java.lang.math, instead of using
> >           infix notation for arithmetic expressions
> In the current round of changes, java.lang.math was strictly off the
> table, but we also think that something has to be done about that. The
> current spec also appears to be hard to implement both correctly and
> efficiently. It is pretty likely that we will consider either loosening
> java.lang.math or introducing a new class that is more loosely defined.

I don't know about others, but I don't see this slight lack of
predictability as a problem, provided, as I noted above, that there's a
way
to disable fused multiply-add in a given expression, a way to guarantee
it'll be used (even if that means emulating it in software), and a way
to
determine at run time whether fused multiply-add is implemented
efficiently
(i.e., in approximately the same amount of time as a single
multiplication
followed by a single addition).  I don't think this lack of
predictability
is any worse than trying to write software without knowing whether an
implementation supports double extended (because figuring out if double
extended is supported could be too cumbersome in certain cases).

Incidentally, I'm thinking perhaps it would be a good idea in the
current
proposal to not allow fused multiply-add except in methods that are
explicitly nativefp.  Otherwise, there could be too many surprises.  For
example, fdlibm produces nonsense results if fused multiply-add is used.
If people try to run preexisting Java code that is written in this
style,
they may get unexpected behavior if methods are not marked as explicitly
strictfp.

> >    - What else would it take to make Java a more ideal language for numerical
> >      computing?
> >      a) allow slightly more flexibility in strictfp mode to lessen performance
> >         impact
> >         - allow double rounding in strictfp mode when result is within double's
> >           denormal range (i.e., on Intel processors, it should be sufficient to
> >           set rounding precision to double, then store-load after every fp
> >           operation)
> >         - bit-for-bit identical results could still be very nearly always
> >           achievable, and performance could at least double in some cases,
> >           though performance might not equal that of a widefp method
> We stewed over this but ended up thinking that we had to stay backward
> compatible when strict. We didn't think this small extension would
> satisfy performance needs (given the store-load costs) but would break
> our promises and still wouldn't give bit-for-bit.

I don't feel strongly about this issue.  I can certainly appreciate the
importance of the "Write once, run everywhere" slogan, and how that
needs
to be more than just words.  On the other hand, I don't see the slight
loss
in bit-for-bit compatibility as a big deal for all practical purposes. 
If
functions are added to java.lang.math as alternatives to basic
arithmetic,
then those functions could be used to indicate places where double
rounding
absolutely must be avoided at all costs.  This issue can be revisited at
a
later time; keeping this aspect of the Java language as is won't make it
any
harder to change in the future.

> >      b) add a data type so wider precision can be referred to by name
> >         - this can be similar to Real Java's doubleN or C's long double data
> >           type; i.e., double extended precision if the underlying arithmetic
> >           engine supports it and performance is close to double arithmetic,
> >           double otherwise
> >         - this would be the format used when evaluating explicitly widefp code
> >         - one possible name for this data type could be "widefp double"
> >         - this will be especially important in the future as wider precision
> >           becomes more commonly supported in hardware
> We considered this but didn't think we could do it at the time, or while
> retaining backward compatibility. Licensees didn't want people to have to
> write new code to take advantage of the increased performance.

Adding a data type like this is more essential in the context of my
WIDEFP
mode.  It is less essential in the context of the current proposal's
"nativefp" mode, though if the current proposal continues to use the
keyword
"widefp," some could argue that it's not important to add such a data
type,
because "widefp" mode would seem to already provide this for local
variables, parameters, and return values.

> >      d) allow implementations to provide greater accuracy, e.g., for elementary
> >         functions
> >         - give programmers a choice between fast, accurate, and standard
> >           (i.e., current) versions of the elementary functions
> Do you mean like the '+' operator? I'm not sure how to do that. However,
> this might be how we go about the math library, e.g. keep java.lang.math
> and add java.lang.fastmath.

By elementary functions, I'm referring to the trigonometric and
logarithmic
functions, not the basic arithmetic functions.

> >      e) for the sake of completeness, one might consider adding
> >         a nonstandard mode that would permit, e.g., abrupt underflow ("flush to
> >         zero"), some compiler optimizations that are currently forbidden, and
> >         unpredictable (but possibly more efficient) expression evaluation;
> >         however, this kind of feature probably doesn't fit very well with the
> >         rest of the language, although note that the spirit of the current
> >         proposal is kind of along these lines - widefp mode is very loosely
> >         specified, unlike the rest of the Java language
> Well, the Java language and libraries try not to specify behavior that
> can't be guaranteed, e.g. thread scheduling isn't specified precisely
> because forcing implementors to implement new thread packages would have
> killed Java server performance. But these things are costly in terms of
> brain-power and have to be very carefully chosen. So we'd need to see a
> case for (e).

The things mentioned in e) might be acceptable in a nativefp mode, but
not
in my WIDEFP mode.  However, I am not requesting for these things to be
incorporated into the current proposal - though the current proposal
mentions something about allowing abrupt underflow (thanks, J. Darcy,
for
noticing this).

SUMMARY AND CONCLUDING REMARKS

Here's how I would like to see the current proposal changed:

1) Choose some other word to use instead of "widefp," one that better
conveys what this mode is all about.  The "widefp" keyword is a misnomer
-
if a programmer were to use this keyword for the purpose of improving
performance, then certainly on SPARC, for example, he or she would not
want
wider precision to be used.  So please do not use "widefp" as the
keyword
for denoting faster performance.  A word such as "nativefp" would be a
far
better choice.  (Another possibility would be "fastfp," but I understand
some object to using "fastfp" as a keyword.)

An additional, and I think very significant, reason for using some other
word such as "nativefp" instead of "widefp" is that it would make it
easier
politically to later get a proposal approved to allow for improved
accuracy.
I can't imagine a way to truly satisfy hard-core performance-oriented
users
and those interested in reasonably-performing, high-quality, robust
numerical applications with just one evaluation mode.  Furthermore, as
it
is, I predict it will be very difficult to get approval for proposals
related to general improvement of floating-point arithmetic. 
(Improvement
in very specific areas, such as Java.lang.math may be easier to effect).
Adding a confusing misnomer at this time will cause making potential
later
changes that much more difficult.  (If the current proposal were adopted
as
is, could you imagine arguing with non-fp savvy decision makers that we
need
to add even more features to the Java language to allow for controlled
use
of wider precision?  The reaction from such a proposal would almost
surely
be, "Why do we need a proposal to enable the use of wider precision,
when we
already have a widefp mode?")

>From now on, I will refer to the current proposal's widefp mode as
"nativefp
mode."

2) Nevertheless, I think it would be wise to be slightly conservative at
this time as far as how much flexibility is allowed in "nativefp" mode. 
In
particular, narrowing should not be arbitrarily allowed when computing
intermediate expressions.  The reason is that we may never be successful
in
getting approval for a mode that is more accuracy oriented, and in this
scenario, this restriction would be very important.  If a third,
accuracy-
oriented mode were eventually added, this restriction could help people
understand why such a mode is needed.  It could also be that this
restriction's impact on performance might turn out to be relatively
minor,
in which case it wouldn't be necessary to lift this restriction in the
future.  (Interestingly, SPARC processors would be at a disadvantage if
a
third mode were not added - strictfp and nativefp modes would likely be
identical, and many (most?) other processor's nativefp mode, which would
evaluate expressions to at least double precision, would yield
significantly
better accuracy than what could be obtained on a SPARC-based JVM.)

3) I think Joe Darcy's suggestion of restricting the float extended
format
to be no wider than double extended is a very good one.

4) Abrupt underflow ("flush to zero") should never be allowed,
regardless of
the mode.  (According to the section "Notes for Implementors" in the
current
proposal, abrupt underflow could happen in nativefp mode.)  If a third
mode
were ever added, then allowing abrupt underflow in nativefp mode might
be
reasonable.

5) Clarify whether fused multiply-add is allowed in nativefp methods,
for
example by explicitly mentioning fused multiply-add in the introduction,
and
saying that an extended format does not have to actually be realized in
hardware.

6) Make the treatment of "unmarked" methods be implementation defined as
to
whether they are considered implicitly nativefp or implicitly strictfp.

7) The access_flags modifiers of the method_info structure should be set
to
ACC_STRICT set and ACC_EXPLICIT clear for unmarked methods.  The
combination
of ACC_STRICT clear and ACC_EXPLICIT set should be reserved for the
future -
it could come in handy as a way of identifying a third,
accuracy-oriented
mode, should such a mode ever be incorporated into the Java language.

8) Fused multiply-add should be allowed in explicitly nativefp methods,
but
forbidden in "unmarked" methods.  (Presumably, JVMs could determine
whether
fused multiply-add can be used by examining a method's ACC_STRICT and
ACC_EXPLICIT bits.)

9) Change the names of the WIDEFP_* constants to start with "NATIVEFP_",
and
require them reflect the characteristics of floating-point formats that
are
actually physically used to store floating-point values.  That is, these
constants should not reflect the characteristics of theoretical
floating-
point formats that don't exist in hardware and are never used in
practice.

10) Add constants to the java.lang.Float and java.lang.Double classes to
indicate whether the underlying JVM uses fused multiply-add at its
discretion in explicitly nativefp methods.  One possible name for these
constants would be CONTRACTED_OPERATIONS, but there may be other
suitable
names.

Hopefully I have covered all the major areas that could stand
improvement,
given the current goals and timeframes.  I hope others will chime in if
I
have missed anything, or if my recommendations are not suitable.