A proposal for RealJava

Fri Jul 4 18:14:17 PDT 1997

A Proposal for RealJava
========================================================
========================================================

"Write once, run with speed and accuracy -- everywhere."

by Jerome Coonen
Draft 1.0, July 4, 1997
(email version best viewed with fixed-pitch font)

Table of Contents
=================

Introduction
RealJava Is
The Bottom Line
Floating Point Architectures
A Model for Portable Programming
  The type hierarchy
  A fast, accurate loop
  Floating literals
  A method call
  Handling floating point state
RealJava Features
  Types
  Expression evaluation
  Conversions and promotions
  Contraction operators
  Literals
  Floating point environment
  Rounding modes
  Exception flags
The RealJava Virtual Machine
  Virtual Machine operations
  RealJVM idioms
Expression Evaluation Examples
  Method invocation
Changes to the package java.lang
RealJava and IEEE 754
Java Expression Evaluation and Optimization
RealJava FAQ

Introduction
============

Designers of the "100% Pure" Java (tm) language have attempted
to guarantee numerical robustness by constraining the language
so that all numerical computations would produce identical
results across all Java platforms.  This constraint, however,
comes at a high price on platforms supporting wide intermediate
registers or extra-precise operations such as "fused multiply-
add." On these platforms, 95% or more of the total, users could
expect to get better answers faster than permitted by Pure Java's
four-function float/double model.

RealJava extends Pure Java to support an alternative approach --
a programming model giving portable access to the features
of all Java platforms.  RealJava derives from the belief that
users want accurate answers delivered at the highest speed,
rather than less accurate results delivered more slowly, whether
or not they match results on less capable machines.

RealJava also brings Java into compliance with IEEE standard
754 for binary floating point arithmetic.

This document presents the essential features of RealJava first,
and then continues with further details and examples of the
programming model.  You are invited to address comments to the
author at jeromeabe.com.

RealJava Is
===========

0) dedicated to the proposition that loops of the form

      double s, x[], y[];
      for (int i=0; i<x.length; i++)
          s += x[i]*y[i];

   or its float counterpart dominate numerical computation.
   They must be fast and accurate.

1) a (nearly 100% pure) superset of Pure Java: code restricted
   to the features of Pure Java will execute according to the
   expression evaluation rules of Pure Java.

2) intellectually economical: as a consequence of (1),
   programmers may restrict themselves to the float/double
   model of Pure Java, sacrificing only the extra accuracy
   and speed of some platforms.

3) bytewise economical: RealJava requires about a dozen new
   bytecodes in the virtual machine

4) easy to implement: by definition, RealJava adds functionality
   related to the most "natural" way to compute on a platform.

5) "best possible": RealJava is sufficiently expressive to be
   compiled to a platform's fastest, most accurate code for a
   given algorithm.

6) based on prior art: RealJava's key features are inspired by
   numerical extensions proposed for C9x, which extensions have
   been debated and refined over the past eight years.

7) standard-conforming: RealJava adds features required by
   IEEE 754 but lacking in Pure Java.

8) cognizant of Pure Java's constraints: RealJava adds features
   without the use of a C-style preprocessor mechanism to provide
   choices to programmers.

9) not exclusive of other extensions, such as some operator
   overloading to support numerical types like interval or complex.

The Bottom Line
===============

The essential motivation for RealJava is the observation that
most processors do their best calculation outside the four-function,
float/double model of older architectures and programming languages.
The widely used Intel architecture (iA) uses 80-bit registers for
intermediate results and several other architectures support a
fused multiply-add instruction, in which two values are multiplied
and added to a third with just one rounding error.

RealJava introduces the type longDouble.  It will be 80-bit
extended on iA, quad (or wider) on some platforms, and
double on platforms with no wider type.

RealJava introduces the two types floatN and doubleN, which
are pseudonyms for float, double, or longDouble, depending
on the platform and the style of expression evaluation.

RealJava offers optional "natural evaluation," in addition to
the default, Pure Java's strict evaluation.  When evaluation
is strict, floatN is synonymous with float and doubleN is
synonymous with double.  When evaluation is natural, floatN
refers to a platform's fastest type with at least the range
and precision of float and similarly for doubleN.

When evaluation is natural, the Java compiler keeps its
intermediate float expressions in the type floatN and keeps
its intermediate double expressions in the type doubleN.
FloatN and doubleN provide a portable mechanism for requesting
a platform's fastest type with sufficient range and precision
for a given calculation.

A programmer elects natural evaluation within a class by
defining a class variable recognized by the compiler:

    private static final boolean NaturalEvaluation = true;

RealJava supports use of fused multiply-add.  Programmers need
simply define the class variable

    private static final boolean AllowContractions = true;

RealJava supports access to the floating point environment,
as required by the IEEE.  RealJava extends java.lang.Math
with methods that read and write the rounding modes and sticky
error flags.

RealJava adds a dozen byte codes to the virtual machine:

   Codes and description
   --------------------------------------------------------------------
   dup4, dup4_x1, dup4_x2 -- duplicate a 4-word stack item
   fcmpq, dcmpq -- comparisons for equality (never set invalid flag)
   f2f, d2d -- nop conversions that can be modified by prefix bytecodes
   natsrc -- the source operand(s) of the next non-prefix bytecode
             have the corresponding natural type (and so does the
             destination, in the case of arithmetic operations)
   natdst -- the destination operand of the next non-prefix
             bytecode has the corresponding natural type
   longsrc -- the double source operand(s) of the next non-prefix
              bytecode have type longDouble
   longdst -- the double destination operand of the next non-prefix
              bytecode has type longDouble
   contract -- the next two operations can be fused, if possible

The use of prefix codes means that most "natural" operations
require two bytecodes in the JVM, but this is expected to have
little impact on the overall size of applications or on JVM
performance.

Floating Point Architectures
============================

In almost two decades since the first discussions of the IEEE
floating point standard, three distinct architectures have
established themselves in the market.  By far the most numerous
are the extended-based iA processors such as Pentium, and the
Motorola 68000 family.

Extended-based machines support the standard 32-bit float and
64-bit double formats, but do their computation, as suggested
by the standard, using extra precision (11 bits, for a total
of 64 bits) and range (4 bits, for a total of 15).  As required
by the standard, iA machines support a mode to round
intermediate results to the precision of float or double.
In order to limit the exponent to the range of float or double,
iA machines must store their intermediate results to memory
and then reload them.  Even then, there can be a small extra
rounding error, avoidable only by very subtle conversion code
(that is almost certainly not worth the cost of execution).

Double-based machines support both float and double formats,
but are most naturally used to deliver double results from all
intermediate expressions.  The PowerPC is such an architecture.
It stores all values in 64-bit double format.  Although it
supports float variants of the arithmetic operations, the speed
of double computation on some implementations leads most C and
some Fortran compiler vendors to simply compute in double.

Finally, there are the orthogonal machines like Sparc,
MIPS r4000, and HP PA-RISC, which support float and double
formats and operations only on pairs of numbers in the same
format.  They add two floats to deliver a float, multiply
two doubles to produce a double, but never mix formats in an
operation other than explicit conversion.

Other features distinguish prevalent architectures today.
All iA processors have core elementary functions in hardware.
The speed and accuracy of their sin() and exp() cannot be
beat in software.  Many newer architectures support a
fused multiply-add, delivering expressions like a*b + c with
just one rounding error.  This operation, which offers momentary
extra precision but lies outside the scope of the IEEE standard,
finds use in a variety of numerical computations.

A Model for Portable Programming
================================

Given the diversity of prevalent floating point architectures, a
programmer must give some care to code that would perform well
across all standard platforms.  A program that declares longDouble
format variables (a la iA) cannot perform well on PowerPC or Sparc,
which would require software to support the extra range and
precision.  On the other hand, the requirement that all
intermediate results be exactly float or double -- as Pure Java
demands -- exacts a significant performance penalty on extended-
based machines.  That this penalty is unnecessary is the topic
of this section.

The key idea to cross-platform portability is to distinguish
between values required to be in float or double format and
values that -- for highest performance, increased accuracy,
or both -- are best stored in the format natural to the host
architecture.

Explicit typing to float or double is necessary when

* storage is an issue, such as with large vectors or
  arrays

* values will be passed across the network, between
  diverse architectures

* exact precision is critical, as in algorithms that
  simulate higher precision by splitting a wide value
  across several floating point values.

A machine's natural "evaluation type" is more suitable

* for unnamed intermediate expressions

* for accumulators in loops

* for vectors of intermediate results in compound matrix
  computations

The type hierarchy
------------------

RealJava introduces the types doubleN and floatN to name
a platform's natural evaluation type for double and float.
A value of type doubleN has at least the range and
precision of the IEEE double format, and similarly for
floatN.

The programmer might visualize the type hierarchy as

    longDouble
           |   \
           |    doubleN  (fast, at least as wide as double)
           |   /   |
        double     |
           |       |
           |    floatN   (fast, at least as wide as float)
           |   /
         float

The types float and double are the IEEE 32-bit and 64-bit
types.  Beyond that, all that's guaranteed across all platforms
is that any given type is at least as wide as the types linked
to it from below in the hierarchy.

When expression evaluation is strict, doubleN is synonymous
with double and floatN is synonymous with float.  When the
programmer requests natural expression evaluation, doubleN
is the platform's fast type at least as wide as double, and
floatN is the platform's fast type at least as wide as float.

When evaluation is natural, the binding of floatN and
doubleN is platform dependent.  The binding occurs when
the bytecodes are executed or when the bytecodes (or source)
are compiled to machine code.

This table shows some mappings for types floatN, doubleN,
and longDouble for various current processors.

  Processor  |   floatN    doubleN   longDouble
-------------+------------------------------------
iA (Pentium) |  extended   extended   extended
     PowerPC |   float      double   double-double
   early PPC |   double     double   double-double
     PA-RISC |   float      double      quad
       Sparc |   float      double      quad
         ARM |  extended   extended   extended
        MIPS |   float      double     double
68K with FPU |  extended   extended   extended

The quad types are provided in software by some vendors.
The double-double type on PPC is an idiosyncratic type,
implemented in software, comprising two double values whose
mathematical (as opposed to floating point) sum is the
number's value.

A fast, accurate loop
---------------------

Here is a simple code sample in RealJava:

// Use native doubleN type and fused mul-add.
private static final boolean NaturalEvaluation = true;
private static final boolean AllowContractions = true;

doubleN s = 0.0;
double[] x, y;  // loop generates error if lengths unequal
for (int i=0; i<Math.max(x.length, y.length); i++)
  s += x[i]*y[i];

On Sparc, or MIPS, this is simply a double accumulation
of the inner product.  On PowerPC and newer PA-RISC, the
loop uses a fused multiply-add with a double accumulator.
On iA, ARM, and 68K, the product and sum enjoy extended
precision and range.

These uses of contraction operators and extended intermediate
values always increase performance over, and almost always yield
more accurate results than, the strict double computation
using double multiply and add operations.

In this form, the code can be compiled to a fast, accurate loop
on any platform.

Floating literals
-----------------

In RealJava, the default type of a floating literal is doubleN.
Pure Java defines a suffix notation to specify a literal's type:
f -- float and d -- double.  RealJava adds the definitions fN --
natural float, dN -- natural double, and ld -- longDouble.

Here are some circumstances in which programmers might use
explicit type suffixes:

* Programs using only float or floatN values and known to be
  compiled to machine code might run faster if all literals are
  typed as f or fN.

* Programs depending on the exact precision and range of float
  or double will require that their literals, like their variables,
  be typed appropriately.  (These will run with strict evaluation,
  anyway, in which the default type of a literal is double.)

* Programs using longDouble values might require longDouble
  literals, too.

When evaluation is natural, literals of type floatN and doubleN
are not bound until execution or compilation to machine code.
In a program compiled to bytecodes, literals cannot be converted
until runtime.  The following two RealJava statements yield
identical values:

private static final boolean NaturalEvaluation = true;
doubleN w1, w2, z;
w1 = z + 8.12;  // will be converted at runtime
w2 = z + DoubleN.valueOf("8.12").doubleNValue();

Pure Java (like proposed C9x) requires that decimal values be
converted to the nearest representable number of the target type,
with ties going to the value with a 0 trailing bit.  Although the
conversion might have to be made at runtime, it does not have to
happen inside a loop in which a constant appears.  In RealJava,
like C9x, any side-effects from the conversion of literals are
invisible to the program.  In the example above, the method
valueOf() would set the inexact flag during the conversion of
8.12.  Consider the simple example:

private static final boolean NaturalEvaluation = true;
double[] t;
for (int i=0; i<t.length; i++)
  t[i] = 1.8*t[i] + 32.0;

The compiler can lift the conversions of 1.8 and 32.0 from the
loop, replacing their occurrences with anonymous temporaries.
Here's the effective form of the reorganized code:

doubleN compilerTemp1 = 1.8, compilerTemp2 = 32.0;
for (int i=0; i<t.length; i++)
  t[i] = compilerTemp1*t[i] + compilerTemp2;

Without changing the behavior of the program, the compiler
has avoided repeated conversions of the decimal strings.
When evaluation is strict and the values are known to be
of type double, their conversion can happen at compile time.

A method call
-------------

Here is another example:

private static final boolean NaturalEvaluation = true;
double y, omega, t, c;
y = Math.sin(omega*t + c);

In RealJava, library methods like the transcendental functions are
defined for type doubleN in java.lang.Math.  In this example,
the expression is evaluated in the type doubleN.

An implementation in which doubleN is wider than double may choose
to deliver results like Math.sin(x) accurate to the precision
of double -- 53 significant bits -- and no more, trading accuracy
for performance.

Handling floating point state
-----------------------------

Some computations can be programmed more clearly and expeditiously
when the programmer can control the direction of rounding or can
test whether specific exceptions have occurred.  The IEEE floating
point standard requires these features of any conforming environment.
RealJava specifies a platform-independent Java binding for these
features that can be captured from one RealJava program and then
passed to another, possibly executing on a different platform.

Here is an example of testing for overflow or invalid after the
execution of a loop.

private static final boolean NaturalEvaluation = true;
doubleN s = 0.0;
double[] x, y;
Math.clearExcept(Math.OVERFLOW | Math.INVALID);  // clear flags for loop
for (int i=0; i<Math.max(x.length, y.length); i++)
  s += x[i]*y[i];
if (Math.testExcept(Math.OVERFLOW | Math.INVALID))
  // respond to exception after the loop has completed

This loop works regardless of the implementation of exceptions
in the underlying hardware.  All applets, applications, and other
threads begin execution with rounding to nearest and with all
exception flags off.

A more interesting example is a method for general use, such as the
hyperbolic cosine, cosh().  The author of a numerical method must
decide to what extent the method will honor the floating point state.
With respect to the rounding mode, a method may

1. run in whatever mode is currently set, possibly expecting the caller
   to have pre-set a documented mode (probably rounding to nearest), or

2. save and restore the rounding mode of the caller across the method,
   computing using rounding to nearest, or

3. honor the rounding mode of the caller, delivering results rounded
   according to the mode of the caller.

Unless they are specifically documented to change the rounding mode,
methods should return to the caller with rounding set as it was when
the method was invoked.

With respect to the error flags, a method may

1. freely manipulate the flags, or

2. leave set any flags already set on entry, setting any other flags
   relevant to the delivered results.

As the next examples show, RealJava provides methods to support the
handling of error flags.

Here is a naive implementation of hyperbolic cosine that illustrates
handling of the environment.  Mathematically, cosh(x) = (e^x + e^-x)/2,
which is at least 1.

private static final boolean NaturalEvaluation = true;
public doubleN naiveCosh(doubleN x) {
  doubleN expOfX, coshOfX;
  expOfX = Math.exp(x);
  coshOfX = (expOfX + (1.0 / expOfX)) / 2.0;
  return coshOfX;
}

Although naiveCosh() may overflow, any underflow that arises in the
computation is spurious.  A better approximation of cosh() saves
the caller's flags at the start, clears any spurious underflow
indication, and then merges the exceptions from naiveCosh(x) with
the caller's flags:

public doubleN naiveCosh(doubleN x) {
  doubleN expOfX, coshOfX;
  int envSave = Math.holdEnv();
  expOfX = Math.exp(x);
  coshOfX = (expOfX + (1.0 / expOfX)) / 2.0;
  Math.clearExcept(Math.UNDERFLOW);
  Math.updateEnv(envSave);
  return coshOfX;
}

To defend against rounding modes other than rounding to nearest, the
method might include the statement

  Math.setRound(Math.TONEAREST);

after the call to holdEnv().  For some purposes, it might be valuable
to have a hyperbolic cosine that honors the directed roundings, too.
The heart of such an implementation could take the form

public doubleN directedCosh(doubleN x) {
  doubleN hiExpOfX, loExpOfX, coshOfX;

  int envSave = Math.holdEnv();  // round components to nearest
  Math.setRound(Math.TONEAREST);

  hiExpOfX = computeLeadingBitsOfCosh(x);  // designed to be exact
  loExpOfX = computeCorrectionBitsOfCosh(x);  // a small correction

  Math.setRound(envSave & Math.ROUNDMASK); // restore for last round
  coshOfX = hiExpOfX + lowExpOfX;

  Math.clearExcept(Math.UNDERFLOW); // clear spurious flags and update
  Math.updateEnv(envSave);
  return coshOfX;
}

For completeness, here's an example of how NOT to use exception flags,
except when there is no alternative:

double[] x, y;
Math.clearExcept(Math.OVERFLOW);
for (int i=0; i<Math.max(x.length, y.length); i++) {
  x[i] *= y[i];
  if (Math.testExcept(Math.OVERFLOW))
    x[i] = Double.MAX_VALUE / 128.0;
}

In this case, a response is required on an element-by-element basis,
but the test in the loop will slow some platforms down considerably.
Presuming that the data is such that most loops will run without
overflow, a better strategy would be to test for overflow after the
loop has completed (just one test) and then respond with a second
fix-up pass through the vector x[].

RealJava Features
=================

Types
-----

RealJava adds the type longDouble, which occupies two words.  Depending
on the platform, longDouble maps to 80-bit extended, quad, some other
wide type, or simply double.  The longDouble type is platform-
specific.

RealJava adds types doubleN and floatN, whose definition depends on
the platform and which of two kinds of expression evaluation the
programmer has chosen.  In the default, strict evaluation, doubleN
and floatN are synonymous with double and float, respectively.

When evaluation is natural, floatN refers to the platform's (fast)
evaluation type at least as wide as float, and doubleN refers to
the platform's evaluation type at least as wide as double.  Regardless
of how floatN and doubleN map, when evaluation is natural they
occupy 4 words.

Expression evaluation
---------------------

RealJava supports two forms of expression evaluation, under
programmer control.

Strict evaluation, the default, forces pure float/double evaluation,
regardless of (or even in spite of) the underlying hardware
architecture.  The types floatN and doubleN are synonymous with float
and double, respectively.  This is Pure Java evaluation.

Natural evaluation uses a platform's evaluation types for
intermediate expressions.  For example, the natural evaluation type
on iA is extended and on PowerPC is double.

Strict evaluation is the default in RealJava; it is the only form
of expression evaluation in Pure Java.  Programmers enable natural
evaluation within a class with the statement

private static final boolean NaturalEvaluation = true;

Conversions and promotions
--------------------------

RealJava extends Pure Java's rules for numeric conversions and
promotions for natural expression evaluation.  Referring again to
the floating type hierarchy, Pure Java's promotion rules apply to
types floatN, doubleN, and longDouble.

    longDouble
           |   \
           |    doubleN
           |   /   |
        double     |
           |       |
           |    floatN
           |   /
         float

RealJava introduces the notion of "implicit narrowing," whereby
a doubleN value can be narrowed to double or a floatN to type
float during assignment of method invocation.  Pure Java permits
no narrowing without a cast, but such narrowing is fundamental to
computation using a platform's intermediate evaluation types with
extended range and precision.  During method invocation, implicit
narrowing of doubleN and floatN occurs only when no match can be
found for the natural types.

A special situation arises when code in a class using strict
evaluation calls a method in a class supporting natural evaluation.
Suppose the class HypeLib defines the method

public static doubleN cosh(doubleN x);

When invoked by code of the form

// Evaluation is strict
double y, x;
y = HypeLib.cosh(x);

the method cosh() requires that its argument be promoted to
doubleN and its result narrowed to double.

Contraction operators
---------------------

RealJava supports use of hardware contractions, such as fused
multiply-add.  By default, contractions are not used.  Programmers
enable contractions within a class with the statement

private static final boolean AllowContractions = true;

Literals
--------

In RealJava, an untyped literal value has the type doubleN.  The
suffixes f, d, fN, dN, and ld (in upper or lower case) may be used
to specify the type of a value.  Literals are converted to the
nearest target value, using rounding to nearest.  Note that the
RealJava type rules mean that untyped literals have type double
when evaluation is strict.

Floating point environment
--------------------------

RealJava extends the package java.lang.Math to include methods that
permit a programmer to save and restore the floating point environment
exposed by RealJava.  The floating point environment is an integer value
encoding the rounding modes and exception flags.  RealJava defines a
platform-independent format for this state value, so it can be passed
between platforms across a net.

The default environment, with which all applets, applications, and
threads begin execution, has rounding to nearest and all flags off.
Each thread executes with its own floating point environment as part
of its thread state.  RealJava provides no explicit methods in support
of multi-threaded programming, though applications wishing to collect
exception flags from several threads may do so with conventional
synchronized, shared variables.

Programmers writing methods cognizant of exception flags and possibly
rounding modes will use holdEnv() at the start of a method to save
the caller's environment and clear all the exception flags.  After
computing a result and clearing all spurious exception flags (such
as undeserved underflow), a method calls updateEnv() to restore the
caller's rounding mode and merge (that is, logically OR) the caller's
exception flags with those set by the method.

  // Methods in java.lang.Math
  public static final int DEFAULTENV = 0x00000000;
  public static int getEnv();
  public static void setEnv(int env);
  public static int holdEnv();
  public static void updateEnv(int env);

Rounding modes
--------------

RealJava extends java.lang.Math to include methods to test and alter
the rounding mode, as specified by the IEEE standard.  The two
methods get and set the rounding mode as an integer value, whose
possibilities are given by the constants.  The ROUNDMASK value permits
a program to extract the rounding mode from an environment value or to
insert one back in.

The rounding modes apply to the floating point operations +, -, *, /,
to specified conversion methods like roundedIntValue() and
roundedLongValue(), and to methods like rint(), which rounds to an
integral value within a floating format.  Other mathematical methods
will
differ with respect to the extent to which they honor the IEEE rounding
modes.  Some methods deliver results rounded according to the mode in
which
they are called; others save the caller's mode and run rounding to
nearest;
still others simply run in whatever mode they're called.  Check the
documentation.

The arrangement and values of the rounding mode bits are inspired by the
most widely used architecture, iA.

  // Constants and methods in java.lang.Math
  public static final int TONEAREST  = 0x00000000;
  public static final int UPWARD     = 0x08000000;
  public static final int DOWNWARD   = 0x04000000;
  public static final int TOWARDZERO = 0x0C000000;
  public static final int ROUNDMASK  = 0x0C000000;
  public static int getRound();
  public static void setRound(int round);

Exception flags
---------------

RealJava extends java.lang.Math to include methods to test and alter
the five exception flags defined and required by the IEEE standard.
The exceptions are overflow, underflow, inexact, divide by zero,
and invalid.  The arrangement of the flag bits corresponds to the
arrangement on the most widely used architecture, iA.

  // Constants and methods in java.lang.Math
  public static final int INEXACT   = 0x00000020;
  public static final int DIVBYZERO = 0x00000004;
  public static final int UNDERFLOW = 0x00000010;
  public static final int OVERFLOW  = 0x00000008;
  public static final int INVALID   = 0x00000001;
  public static final int FLAGMASK  = 0x0000003D;
  public static void clearExcept(int excepts);
  public static int testExcept(int excepts);
  public static void raiseExcept(int excepts);

The RealJava Virtual Machine
============================

Java is not one language but two -- the source language so
reminiscent of C/C++ and a bytecoded intermediate language with
the feel of Forth.  The intermediate language provides a compact
representation of a Java program, enabling code to be passed among
diverse platforms over a network.  A user may "execute" the bytecodes
with an interpreter (the Java Virtual Machine) or the user may
execute native code compiled from the source code or the bytecodes.
In the latter case, compilation may occur on the fly with the request
for execution, in which case one speaks of a Just In Time (JIT)
compiler.

With regard to numerical computation, the bytecode representation
of Java has many of the properties of the source language.  In
particular, the bytecodes require all the platform independence of the
source language.

Virtual Machine operations
--------------------------

This section lists the changes to the Virtual Machine to support
RealJava.  The required operations are

* add, sub, mul, div, rem, neg, and comparisons on types floatN,
  doubleN, and longDouble

* conversions among floatN, doubleN, and longDouble, and conversions
  between those and the Pure Java types

* local variable load and store of types floatN, doubleN, and longDouble

* method return of floatN, doubleN, and longDouble values

* stack duplicate of a 4-word entity (for floatN, doubleN, and
  longDouble values)

* conversions between any pairs of integer and floating types

* comparisons signaling invalid when their operands are unordered
  (as required by the IEEE standard)

Because of its restriction to strict evaluation, the Pure Java Virtual
Machine knows the sizes of its floating types.  The JVM can then push
and pop floating values to and from the operand stack using word
operations, that is, without regard to the numerical values.  RealJava
extends this notion by storing all longDouble values and, when
expression evaluation is natural, all floatN and doubleN values, in
4 words.

Here is the list given earlier of the new bytecodes.  Some are prefix
codes, and some situations call for two prefix bytecodes.

 Count Codes and description
   3   dup4, dup4_x1, dup4_x2 -- duplicate a 4-word stack item
   2   fcmpq, dcmpq -- comparisons for equality (never set invalid flag)
   2   f2f, d2d -- nop conversions that can be modified by prefix
bytecodes
   1   natsrc -- the source operand(s) of the next non-prefix bytecode
                 have the corresponding natural type (and so does the
                 destination, in the case of arithmetic operations)
   1   natdst -- the destination operand of the next non-prefix
                 bytecode has the corresponding natural type
   1   longsrc -- the double source operand(s) of the next non-prefix
                  bytecode have type longDouble
   1   longdst -- the double destination operand of the next non-prefix
                  bytecode has type longDouble
   1   contract -- the next two operations can be fused, if possible
  --
  12

By convention, in operations such as addition, whose operands have a
common
type, the "src" prefix is used.  The prefix natsrc applied to a load
operation affects both operands; and natdst applied to a store affects
both
operands.

RealJVM idioms
--------------

It's helpful to have simple names for the common idioms.  Here is a
representative sample, using comma-separated sequences of bytecodes:

dn2i = natsrc, d2i -- convert doubleN to int
l2dn = natdst, l2d -- convert long to doubleN
d2dn = natdst, d2d -- convert double to doubleN
dn2d = natsrc, d2d -- convert doubleN to double
df2dn = natsrc, natdst, f2d -- convert floatN to doubleN

dload x -- load a double value
dloadn x = natdst, dload x -- load a double value and convert to doubleN
dnload x = natsrc, dload x -- load a doubleN value, without conversion
(The fourth case, loading a doubleN and converting to double, arises far
less frequently and requires four bytes: dloadn x, dn2d.)

dstore x -- store a double value
dnstored x = natsrc, dstore x -- convert doubleN to double and store
dnstore x = natdst, dstore x -- store a doubleN, without conversion
(The fourth case, converting a double to doubleN and storing, arises far
less frequently and requires four bytes: d2dn, dnstore y.)

dnadd = natsrc, dadd -- add two doubleNs to get a doubleN
dncmpq = natsrc, dcmpq -- compare two doubleNs for equality

ld2i = longsrc, d2i -- convert longDouble to int
l2ld = longdst, l2d -- convert long to longDouble
ld2d = longsrc, d2d -- convert longDouble to double
d2ld = longdst, d2d -- convert double to longDouble
ld2f = long, d2f -- convert longDouble to float
dn2ld = natsrc, longdst, d2d -- convert doubleN to longDouble
ld2dn = longsrc, natdst, d2d -- convert longDouble to doubleN
ldload = longsrc, dload -- load londDouble
ldstore = longsrc, store -- store longDouble
ldsub = longsrc, dsub -- subtract longDoubles to give longDouble
ldcmpg = longsrc, dcmpg -- compare longDoubles

The comparison operators deserve special mention.  The IEEE standard
requires that the invalid flag be raised when unordered operands
are compared using <, <=, >=, or >, and that it not be raised when
unordered operands are compared with the relations == and !=.
For this reason, there need be only one comparison bytecode that
doesn't signal invalid for unordered operands.  By (arbitrary)
convention, unordered operands appear greater than with this
operator, but this doesn't affect code generation.

Expression Evaluation Examples
==============================

Expression evaluation is completely determined by the bytecodes emitted
by the RealJava compiler. It's at the source level that programmer's
determine expression evaluation will be used.  Here is a list of typical
numerical expressions, with byte counts greater than 1 given.

1) double a, b, x, y;
   y = a*x + b;

  Natural:
     dloadn b   // (2 bytes) simple fp register load
     dloadn a   // (2)
     dloadn x   // (2)
     (contract) // (permit fused-mul add, if enabled by programmer)
     dnmul      // (2) native multiply on any platform
     dnadd      // (2) native add
     dnstored y // (2) store, after implied narrowing of doubleN

  Strict:
     dload b
     dload a
     dload x
     (contract) // (permit fused-mul add, if enabled by programmer)
     dmul       // coerce result to double if product is wide
     dadd       // coerce result to double...
     dstore y

2) doubleN s;
   double a, b;
   s = s + a*b;

  Natural:
     dnload s   // (2) load doubleN value
     dloadn a   // (2) fp register load of double value
     dloadn b   // (2)
     (contract)
     dnmul      // (2)
     dnadd      // (2)
     dnstored s // (2) convert to double and store

  Strict:
     Same as example (1), with s, a, and b all of type double.

3) double z;
   float e, f;
   z = z + e*f;

  Natural:
     dloadn z   // (2) load double to fp register
     floadn e   // (2) load float to fp register
     floadn f   // (2)
     (contract) // (if the fn2dn is a nop, the mul-add can fuse)
     fnmul      // (2)
     fn2dn      // (3)
     dnadd      // (2)
     dnstored z // (2) store, with implied narrowing

  Strict:
     dload z
     fload e
     fload f
     fmul
     f2d         // this conversion inhibits fusing the mul and add
     dadd
     dstore z

Method invocation
-----------------

Java supports overloaded methods, with resolution of types at
compile time, and type conversion if necessary.  Java specifies
the choice of overloaded methods according to the types of the
operands.  Pure Java specifies widening, when necessary to match a
method's signature.  RealJava adds implicit narrowing of doubleN
to double and floatN to float, when necessary.

Here is some typical examples of method invocation.  In the first
case, both doubleN and double forms of the method are defined.

doubleN compound(doubleN r, doubleN n); // return (1+r)^n
double  compound(double  r, double  n);
int n;
double p, s, t;
p = compound(s * t, n);

  Natural:
     dloadn s   // (2)
     dloadn t   // (2)
     dnmul      // (2)
     iload n
     i2dn       // (2)
     <invoke doubleN method compound>
     dnstored p // (2)

  Strict:
     dload s
     dload t
     dmul
     iload n
     i2d
     <invoke double compound>
     dstore p

The compiler narrows doubleN expressions to double, if
necessary.

double compound(double r, double n); // return (1+r)^n
int n;
double p, s, t;
p = compound(s * t, n);

  Natural:
     dloadn s   // (2)
     dloadn t   // (2)
     dnmul      // (2)
     dn2d       // (2) implicit narrowing during natural evaluation
     iload n
     i2d
     <invoke double method compound>
     dstore p

  Strict:
     Same as previous example

And here's what happens if the only definition is doubleN:

doubleN compound(doubleN r, doubleN n); // return (1+r)^n
int n;
double p, s, t;
p = compound(s * t, n);

  Natural:
     Same as the first compound() example, with double and 
     doubleN forms.

  Strict:
     dload s
     dload t
     dmul
     d2dn       // (2)
     iload n
     i2dn       // (2)
     <invoke doubleN compound>
     dnstored p // (2)

When a fast float version of a library method is desired, just
define it with float or floatN arguments.  When evaluation is
strict, float expressions are promoted to double before floatN
in order to match a method's signature.

doubleN annuity(doubleN r, doubleN n); // return (1 - (1+r)^-n)/r
double  annuity(double  r, double  n);
floatN  annuity(floatN  r, floatN  n);
int n;
float a, b, c;
a = annuity(b / c, n);

  Natural:
    floadn b   // (2)
    floadn c   // (2)
    fndiv      // (2)
    iload n
    i2fn       // (2)
    <invoke floatN annuity>
    fnstoref a // (2)

  Strict:
    fload b
    fload c
    fdiv
    f2d        // preferred over f2fN during strict evaluation
    iload n
    i2d
    <invoke double annuity>
    d2f
    fstore a

Changes to the package java.lang
================================

Here is a summary of changes to java.lang that reflect the new floating
types:

Add classes for types floatN, doubleN, and longDouble.
Add methods to all floating classes to support conversions to
  and from new floating types.
Add methods to both integral classes to support conversions to
  and from new floating types.
Add signBit, nextAfter and copySign methods to all floating classes.
Add support for the floating point state to java.lang.Math.

java.lang.Float

  Add methods to support conversions to and from new floating types,
  rounded (rather than merely truncated) conversions to integral types,
  and methods from the IEEE 754 appendix.  Organize the class so that
  references to the natural and long double types occur after the
  float/double definitions.

  // class methods
  public static int signBit(float value);
  public static float copySign(float magVal, float signVal);
  public static float nextAfter(float srcVal, float targVal);
  // instance methods
  public int roundedIntValue();
  public long roundedLongValue();
  public int signBit();
  public float copySign(float signVal);
  public float nextAfter(float targVal);

  // advanced methods
  // constructors
  public Float(floatN value);
  public Float(doubleN value);
  public Float(longDouble value);
  // instance methods
  public floatN floatNValue();
  public doubleN doubleNValue();
  public longDouble longDoubleValue();

java.lang.Double

  Add methods corresponding to those added to java.lang.Float.

java.lang.DoubleN

  This class is new in RealJava.

public final class DoubleN extends number {
  // constants
  public static final doubleN MIN_VALUE = <platform-dependent value>;
  public static final doubleN MAX_VALUE = <platform-dependent value>;
  public static final doubleN NEGATIVE_INFINITY = -1.0dN / 0.0dN;
  public static final doubleN POSITIVE_INFINITY = 1.0dN / 0.0dN;
  public static final doubleN NaN = 0.0dN / 0.0dN;
  // constructors
  public DoubleN(doubleN value); // and promoted float, floatN, double
  public DoubleN(longDouble value);
  public DoubleN(String s) throws NumberFormatException;
  // class methods
  public static String toString(doubleN dn);
  public static DoubleN valueOf(String s)
    throws NullPointerException, NumberFormatException;
  public static int[] doubleNToIntBits(doubleN value);
  public static doubleN intBitsToDoubleN(int[] bits);
  public static boolean isNaN(doubleN dn);
  public static boolean isInfinite(doubleN dn);
  public static int signBit(doubleN value);
  public static doubleN copySign(doubleN magVal, doubleN signVal);
  public static doubleN nextAfter(doubleN srcVal, doubleN targVal);
  // instance methods
  public String toString();
  public boolean equals(Object obj);
  public int hashCode();
  public int intValue();
  public int roundedIntValue();
  public long longValue();
  public long roundedLongValue();
  public float floatValue();
  public double doubleValue();
  public floatN floatNValue();
  public longDouble longDoubleValue();
  public boolean isNaN();
  public boolean isInfinite();
  public int signBit();
  public doubleN copySign(doubleN signVal);
  public doubleN nextAfter(doubleN targVal);
}

java.lang.FloatN

  This new class corresponds to java.lang.DoubleN.

java.lang.longDouble

  This new class corresponds to java.lang.DoubleN.

java.lang.Math

  In RealJava, the methods in Math are defined with doubleN and
  floatN arguments, rather than the double and float of Pure Java.
  This provides fast, accurate library functions on all platforms,
  whether evaluation is strict or natural.

  RealJava adds a number of constants and methods to handle floating
  point state.

  public static final int DEFAULTENV = 0x00000000;
  public static final int TONEAREST  = 0x00000000;
  public static final int UPWARD     = 0x08000000;
  public static final int DOWNWARD   = 0x04000000;
  public static final int TOWARDZERO = 0x0C000000;
  public static final int ROUNDMASK  = 0x0C000000;
  public static final int INEXACT    = 0x00000020;
  public static final int DIVBYZERO  = 0x00000004;
  public static final int UNDERFLOW  = 0x00000010;
  public static final int OVERFLOW   = 0x00000008;
  public static final int INVALID    = 0x00000001;
  public static final int FLAGMASK   = 0x0000003D;
  public static int getEnv();
  public static void setEnv(int env);
  public static int holdEnv();
  public static void updateEnv(int env);
  public static int getRound();
  public static void setRound(int round);
  public static void clearExcept(int excepts);
  public static int testExcept(int excepts);
  public static void raiseExcept(int excepts);

RealJava and IEEE 754
=====================

RealJava claims full conformance to the IEEE standard for binary
floating point arithmetic, in the sense that it complies with every
"shall" directive in the standard.  For IEEE 754 aficionados, here
is a list of subtle conformance issues:

* The standard specifies optional traps associated with the
  exceptions overflow, underflow, invalid, divide by zero,
  and inexact.  RealJava does not support traps, primarily
  because the cost of doing so portably would outweigh the
  apparent benefit.

* The standard requires that binary-decimal conversion be performed
  with worst-case extra error 0.47 unit in the last place.  RealJava
  like Pure Java, requires correct rounding for all binary-decimal
  conversions.  This removes all ambiguities in conversions at
  negligible added cost over the "traditional" algorithms with
  their slightly looser error bound.

* The standard requires that comparisons involving the predicates
  <, <=, >=, and > raise the invalid exception when one or both
  of the operands is NaN.  RealJava supports this requirement,
  through extensions to Pure Java's comparison operators.

* The standard requires that results be rounded under program
  control.  RealJava supports the four types of rounding, using
  a dynamic mode setting.

* The standard requires, on machines whose natural evaluation type
  is double or extended, that it be possible to achieve results
  rounded according to what RealJava calls strict evaluation.
  RealJava requires strict evaluation as the default, though it
  admits natural evaluation for applications requiring higher
  performance (with generally increased accuracy).

* One of the best-known features of the standard is gradual
  underflow, whereby tiny results are "subnormalized" with the
  format's minimum exponent, rather than being simply "flushed"
  to zero.  RealJava and Pure Java require that the floating
  point engine implement gradual underflow.

* The standard specifies a class of signaling NaNs that trigger an
  invalid exception when they arise in any arithmetic operation.
  RealJava, by specifying a portable mechanism for the invalid
  exception flag, supports the invalid exception raised by the
  underlying floating point engine.

  The standard doesn't specify how to distinguish quiet NaNs from
  signaling NaNs, so the distinction varies from platform to
  platform.  Programmers should note that quiet NaNs may become
  signaling NaNs if passed between different platforms.  When a
  signaling NaN is used in an arithmetic operation, the invalid
  flag will be raised and the floating point result, if any,
  will be one of the platform's quiet NaNs.

Java Expression Evaluation and Optimization
===========================================

The Pure Java specification places strict demands on expression
evaluation, in the interest of attempting to achieve identical
results for all computations across all platforms.  A pleasant
side-effect is that the quality of numerical computation improves
with tighter specifications.  A well-known example, now past, is
the historic practice in C compilers of ignoring parentheses in
expressions such as (x + y) + z.  While addition is an associative
operation in mathematics, and even for 2's-complement integers,
floating point addition is not generally associative.

This section, derived from C9x and related work, mentions a list
of optimization pitfalls related to floating point computation.
All variables of any floating type.

* Constant expressions (i.e. those involving literal values) are
  evaluated at runtime.  Side effects from expressions like
  0.0 / 0.0 arise at runtime, even though the conversions may be
  performed at compile time.

* Don't replace (x == x) with true. The relation is not true of NaNs.

* Don't replace (x != x) with false. It is true of NaNs.

* Don't replace (x - x) with 0.0. The result is x if it's a NaN and
  invalid (a NaN) if x is infinite; otherwise, in RealJava, which
  supports IEEE rounding modes, the sign of the zero result depends
  on the rounding direction.

* Don't replace (0.0 * x) with 0.0. The sign of the result depends
  on x when x is finite, and the value depends on x when x is NaN
  or infinite.

* Don't replace (x + y) - y with x.  Addition is not associative.

* Don't replace x + x*y with (x + 1.0)*y.  The distributive law
  may fail in floating point computations, too.

* Don't simplify expressions like x + 0.0 and x - 0.0, and don't
  change -x to 0.0 - x.  In these cases, when x itself is zero the
  result sign will depend on the rounding direction.  Similarly,
  don't change x - y to -(y - x).  If that result is zero, the
  two expressions will have opposite sign. The expression
  x - y is the same as x + (-y).

* Don't change if (x < y) {block A} else {block B} to
  if (x >= y) {block B} else {block A}.  If x and y are unordered
  because one or both is NaN, block B should be executed and
  the invalid flag should be set.

* In expressions involving constants, the freedom to optimize may
  depend on whether the constant is exactly representable as a
  floating point value.  The expressions x / 5.0 and x * 0.2 are
  different because 0.2 must be rounded to a binary floating point
  value.  The expressions x / 2.0 and x * 0.5 are identical, as are
  1.0 * x, x, and x / 1.0.

RealJava FAQ
============

1.  Why the hack using magic boolean variables to enable features?

    An essential part of the purity of Java is its lack of a
    preprocessor.  Programmers already use private static booleans
    for condition compilation.  RealJava goes a step further.

2.  What if I just use float and double variables?

    You get 100% Pure Java expression evaluation.  RealJava permits
    different implementations of functions like sin() and exp() on
    different platforms, so you won't be guaranteed bit-identical
    results across all platforms, but you'll get pure float and double
    expression evaluation.

3.  Isn't it ugly to have the java.lang numerical classes grow as
    the square of the number of types?

    Yes, the Java language specification went the usual route of
    linearizing what is essentially 2D information about mixing and
    converting types.  There are compact, alternative representations.
    As it is, the classes can be organized to partition off the floatN,
    doubleN and longDouble methods.

4.  Does natural evaluation always yield better results on iA?

    No.  Although the results are "almost always" more accurate
    than their pure float/double counterparts, it's easy to
    construct examples where the extra information becomes
    misinformation which can in turn be promoted to have high
    relative error.

    Then there are the calculations designed for a specific
    precision.  These techniques can be used to simulate higher
    precision using just float or double values.

5.  What is the performance degradation on iA for a double inner
    product loop running with strict versus natural evaluation?

    Some tests have shown a factor of 10 or more slow-down due
    to the stores and loads required to coerce intermediate
    results to double on iA.

6.  What are the performance implications of the IEEE exception flags?

    Pure Java already constrains the order of execution to such
    an extent that inserting a flag test into a sequence should
    be possible.  The performance will depend on a platform's
    ability to interrogate the flags in hardware, which can often
    take far more time than a simple arithmetic operations because
    of the interface to the underlying system.

7.  Won't converting decimal constants at runtime degrade
    performance?

    No.  Each constant that appears must be converted just once,
    even if it appears in a loop.

8.  Doesn't the use of extended intermediate values precipitate
    the same portability problems as have 16, 32, and 64-bit widths
    for pointers and signed and unsigned integers?

    No.  The notion of "at least as much range and precision" is
    well defined and generally a good thing in numerical computing;
    because natural evaluation must be requested, it can be
    avoided when necessary.  Differences in the high-order bits
    of integral values, especially pointers, expose fundamental
    incompatibilities.