Comments on Sun's Proposal for Extension of Java Floating Point in JDK 1.2

Fri Aug 14 13:02:46 PDT 1998

Hello.

> Lindholm's comments on Darcy's comments on Lindholm's comments on
> Figueroa's comments on PEJFPS = "Proposal for Extension of Java(TM)
> Floating Point Semantics, Revision 1"

[problems with using PEJFPS's scheme of (perhaps misleadingly)
indicating extra-precisions is in use to also indicate fused mac is in
use]

> True, the white paper is not at all clear about when fused mac can be
> used.  The extended format parameters won't tell you that reliably
> either.  But that is consistent with the fact that we intended a code
> generator to be free to decide when it wants to use fused mac.  Of
> course, like Sam Figueroa proposed, we might provide additional explicit
> support for fused mac as well.

There are three options to support with respect to fused mac: it must
be used, it must not be used, and it may be used if convenient (use it
if you got it).  The easiest way to support the first option is with a
fmac method call.  Something akin to "fpstrict" would prohibit fused
mac's implicit use.  Using fused mac if convenient could be made the
default (this wouldn't hurt most code and would allow the code to run
faster on fused mac capable machines.)

Including a fused mac method in the Java Math library would allow
other machines to emulate fused mac capable ones.  As I understand it,
Swing allows Java programmers to see how the GUI of their application
would look on, say, a Mac when actually running the application on a
Windows PC.  Analogous emulation capabilities are useful for some
floating point programs and to some programmers.

> It might be good if you'd explain why it's useful to be able to
> recognize conditions for fusion in the source.  If you are writing code
> that aspires to be at all implementation-dependent, then there can be

Implementation dependent is not equivalent to arbitrary.  There are
basically three extant styles of IEEE 754 floating point architectures
exemplified by x86 (hardware support for double extended), PowerPC
(hardware fused mac), and SPARC (plain float and double in hardware,
at best software support for quad).  That isn't so many different
styles that providing code tuned for each one is out of the question
(especially since the first two can (perhaps with a few caveats) fairly
easily emulate the SPARC style).  There are certainly many fewer
floating point architectures than (web browser x operating system x
Java version) combinations.

> no guarantee each implementation will support the optimization.  So
> if you really want to be sure you get the optimization, you have to
> know your code generator.

> Would your algorithms look at fp.mac and adapt?

Some peoples algorithms would, certainly.  The best algorithm for a
given task on a given platform can knowingly take advantage of
resources specific to that platform.  Apple's math library is written
to take advantage of the PowerPC's fused mac and wouldn't work without
it.  (Their compilers include pragmas to disallow fused mac on a
line-by-line granularity.)  On the x86, a different algorithm for the
same function could take advantage of the extended precision
registers, etc.

As for why someone would want to know what can or cannot be fused, it
is the same reason someone would want to know the order of evaluation
or other aspects of language semantics.

If the programmer writes 

a*b + c*d

in a compiler using fused mac is this translated into

fmac(a, b, c*d)

or

fmac(c, d, a*b)

or the former sometimes and the latter the rest, or perhaps no fused
mac at all?

The two ways of evaluating the expression using fused mac give
different results and might not both be wanted.

Consider the following code in a compiler using fused mac:

d = a * b;
// some code that doesn't assign to a or b
e = a*b +c ;

Does the expression to the right hand side of e get fused?  The
programmer might reasonably expect it to.  But, it will not be fused
if the compiler "optimized" the code by reusing the common
subexpression a*b (at a meeting I attended to discuss Java numerics,
James Gosling said he has been bitten by such compiler behavior).

In short, the reasons to define where fused mac can or cannot be used
are the same reasons to have a well-defined language specification, so
programmers can rely on details and write predictable code, even if
that predictable code is platform dependent.  Why wouldn't you want
the programmer to know what the program's semantics are?

[SPARC as one IEEE 754 compliant processor]

> FWIW: Please don't take anything I say in this discussion as toeing a
> company line on SPARC or denigrating any other processor.  In fact, it
> should be clear that a major motivation for the current proposal is to
> improve performance on non-SPARC processors, presumably at some cost to
> SPARC.

There would be no cost to the SPARC.  The SPARC currently implements
"strictfp" and a legal compilation of "widefp" is "strictfp."  As I
read it, PEJFPS's widefp on a processor like the SPARC does not
sanction changing the order of expression evaluation in ways that
would change the answer.  For example, PEJFPS's would not allow SPARC
to re-write an expression using the associative law to keep pipelines
full, etc.  If anything, PEJFPS is an attempt to bestow on the x86 the
"architecture neutrality" Java has claimed since its inception.

Even in its "write once, run anywhere" zeal, Java has always made some
allowances both to improve performance and accommodate processor
limitations.  As you point out elsewhere, the thread scheduling policy
is not fully specified to allow better server performance.  In the
JVM, loads and stores of 64 bit quantities do not have to be atomic;
this is described as "a pragmatic concession to current hardware
practice" ([JVM] section 8.4).  Potentially, this lack of atomicity
could allow two threads to each be the last one to write the two
different halves of a double or long variable.  If some small
concessions can be made in these areas, why not for double rounding on
underflow on the x86?

[Java is no longer Oak.  Therefore, Oak constraints should not be used
to justify Java's current floating point.]

> Untrue.  Oak became Java only after we began making it available to
> people in roughly its current form and concept.  It took off so
> quickly that we immediately desired to provide compatibility across
> releases.

Yes.  And since it has taken off Java has accreted API after API.
Many of the APIs and capabilities added since Java 1.0 address needs
not present in the embedded set-top-box market Oak was intended for.
Jar files, 3D graphics, JavaBeans, and weak pointers wouldn't be
necessary to control a set top box.  Ironically, to target Oak's
original market niche, "EmbeddedJava" must strip out many of full
Java's APIs.  Therefore, Java has been developing largely according to
the needs of desktop programmers.  IEEE 754 sticky flags and rounding
modes could be supported with a small API and some concomitant
limitations on the optimizations Java -> bytecode compilers and
bytecode -> native instructions jits/compilers can perform.

>  We had our one opportunity to introduce a discontigous
> change, but thereafter we've had the same problems with evolving the
> system as everyone else and have to live with the implications of the
> early decisions.

Java does include the "deprecated" tag which (rightly) acknowledges
that sometimes old APIs can and should eventually be replaced by newer
ones.

[just what is widefp supposed to mean?]

> I'd asked this of Sam too: is your concern that in widefp methods the
> spec does not mandate that everything stays wide, or is it that on
> processors that don't implement IEEE 754 extended formats, that widefp
> methods are likely to use non-extended formats, or something else?

Reading the proposal, it is clear that the PEJFPS is primarily
concerned with allowing something with the syntax of Java to run fast
on the x86.  Due to your other comments regarding compiler writers
insisting to be able to spill 64 instead of 80 bit values, it seems
reasonable to presume that some of the licensees with the most
influence over drafting PEJFPS were x86 compiler writers (either at
Intel or elsewhere).

My concern is that implementing the desires of certain x86 compiler
writers would be detrimental to the larger Java community.  Of course,
the x86 is important to Java (probably many more cycles of Java
execute on the x86 than SPARC and PowerPC combined).  Therefore,
having good Java compilers for the x86 is important for Java
programmers and users too.

Sun, as Java's originator and steward, has a responsibility to try to
evolve Java to meet the needs of Java's users.  Many other widely used
languages, C, Fortran, C++, either by default or with suitable
compiler flags implement "widefp" numerics.  And these widefp numerics
continue to cause hard to find bugs that dissipate programmers' time
and contribute an air of mystery and frustration to floating point
computation.  See the recent comp.compilers thread "inlining +
optimization = nuisance bugs" for a contemporary example of precisely
the type of floating point discrepancies PEJFPS would introduce into
Java, including already-written Java code.

My original critique of PEJFPS quotes from the preface of "The
Java(TM) Language Specification."  To paraphrase, "ignoring all
non-deterministic parts of the language, the language is totally
deterministic."  This statement is vacuously true of any language,
including Java.  The difference between Java and C in this regard is
that after removing the non-deterministic parts, there would be a
non-empty subset of Java left.  I think this is the key feature of
Java, a disciplined programmer can make a portable, predictable
program.

Exact reproducibility isn't achieved or achievable in practice, but
predictability is.  Compiler writers focused on improving benchmarks
scores are prone to excess.  SPEC95 added rules outlaw certain
compiler abuses.  PEJFPS removes Java's predictability in an apparent
effort appease compiler writers who only want to generate the fastest
code possible, at the expense of Java's distinguishing promises of
reproducibility.

> Said another way, the intention is that regardless of whether you are
> currently running on a processor that implements extended formats,
> you'd always write wide code anticipating the use of extended formats.

To avoid re-introducing historic mistakes into Java's floating point,
no access should be granted to extended precision without including a
language type (or class) that can map to that format.

See my original critique of PEJFPS for the technical details of the
confusing perversions allowed by the proposal, such as float variables
being wider than double variables.  Even Fortran outlawed that.

> I can't differentiate complaints lack of reproducibility from double
> rounding on underflow from complains of lack of reproducibility from
> other causes.  Certainly our testers have complained about lack of

If that is true, your conformance tests are not at all useful.  It is
not hard to write a program that detects if double rounding on
underflow has occured, see [Gol] (I assume you are familiar with this
document since the code on page 30 of PEJFPS appears to be adapted
from [Gol]).  Similarly, it is possible to write a program that
detects if transcendental functions return values differing from the
FDLIBM algorithms.  Test programs can also detect if binary <->
decimal conversion is not correctly rounded (binary to decimal
conversion was fixed in Java 1.1).  The Java conformance suites I have
seen have many different programs that test very specific parts of the
language specification.  Surely such programs can give some indication
as to what is wrong.  After all, perhaps the test program is in error.

> them is a possibility.  Balancing reproducibility with good performance
> (JITs want to inline these things) is tough.  Suggestions very welcome.

It seems there is need for two different math libraries, a (perhaps
slower) one standardized on all machines and a platform-specific one
optimized for a given architecture.

> Agreed, and therein lies some of the problem.  API changes are very
> visible and thus very easy to understand.  That's not so of more subtle
> changes like introduction of double rounding.

I disagree, introducing certain APIs (e.g. reflection and weak
references) forces subtle changes to the VM implementation.  These
changes might be invisible to the user, but certainly not the VM
implementor.  Since the very small subnormal numbers affected by
double rounding on underflow don't occur often in practice, they are
not likely to be noticed.  Plus, if subnormals numbers do start
showing up on say, a SPARC, the most likely thing the user will notice
is that suddenly the program seems to stop making progress.  As I
alluded to in a previous message, on the UltraSPARC a single operation
on subnormal values takes upwards of 10,0000 cycles, 3,333 times more
latency than a regular multiply or add.

-Joe Darcy
darcyacs.berkeley.edu

References
----------

[Gol] Roger A. Golliver, "First-implementation artifacts in Java(TM)"

[JVM] Tim Lindholm and Frank Yellin, "The Java(TM) Virtual Machine
Specification," Addison-Wesley, 1997.