double rounding in x86

Wed Aug 30 12:57:12 PDT 1995

It looks like I need to clarify some things I said in a previous message.
In my haste, I was not as clear as I should have been.  I'd also like to make
some additional comments relevant to the discussion Tom Lynch started.

In a previous message, Tom Lynch (Tom.Lynchaamd.com) writes:
  >> Nowhere does the standard require that values in registers be stored bit-
  >> for-bit to memory.  The accuracy and rounding requirements only apply to
  >> the result of an operation....
  >> Furthermore, double rounding does NOT occur - results of operations are
  >> rounded once before being placed in extended precision registers.  The
  >> IEEE Standard does not require results to be rounded taking into account
  >> the narrower exponent range for single and double precision formats if the
  >> destination is an extended precision format, regardless of how the
  >> precision mode is set. 
  >An interesting nuance, but it would seem that a "result" is something
  >which could be used in further calculation.  Hence, the "result"
  >should be writable to memory.  If a result can be a fleeting entity
  >which only provides a hint as to the operand provided for further
  >calculation, then one could think up conforming systems which truly
  >produces bizarre final results.

It looks like the IEEE Standard doesn't fulfill your wishes, as perhaps it
doesn't for a lot of people.  What you seem to want is some way to take a
piece of code written in a high-level language, and be able to determine
the exact semantics (and possibly the exact result, in cases where the values
of the operands are known) of a floating-point expression by following rules
which you might feel are the most reasonable ones.  I have encountered other
expressions of this sentiment before, and I believe it is not without merit.
However, no amount of wishing is going to change the language used in the
IEEE Standard, and in particular, the scope of the IEEE Standard clearly does
not extend to how the Standard relates to high-level languages.  Certainly,
the notion of what a result is differs from what you expressed above.
(Unfortunately, I left my copy of the IEEE Standard at home.)  Also, on the
x86 architecture, it *is* possible to store the result of an operation to
memory: simply store the result bit-for-bit to an 80-bit memory location.
So on the x86 architecture, the result of an operation is not "a fleeting
entity."

  >Perhaps one could fix the problem by saying "x86 floating-point
  >*stack* arithmetic is IEEE std754 compliant" hence, arithmetic where
  >intermediates go to memory would be non-compliant because of the
  >double rounding.  Such a definition would be very unfortunate, as it
  >would be difficult for a compiler/programmer to guarantee that all
  >calculations would be done on the stack when IEEE 754 arithmetic is
  >desired.

Again, you seem to imply that the IEEE Standard is relevant to what the
semantics of high-level language constructs are, which in of itself it clearly
is not.  Given language standards' typical silence on the semantics of
floating-point arithmetic, a more relevant standard would be the Language
Independent Arithmetic Standard (LIAS), which several in this forum have
criticized.  In any case, figuring out how to map high-level language
constructs to the features an IEEE-conforming arithmetic engine provides
is not an easy task, given the variability the IEEE Standard allows.
Perhaps this is in part responsable for your dissatisfaction (notice the
word "unfortunate" in the paragraph above), and that of others (see Vaughan
Pratt's recent message).

  >> In the case of the x86, results of arithmetic operations
  >> are stored in *extended precision* registers, and NOT to double precision
  >> registers.  When those values are stored to memory, if the destination
  >> format is double precision, those values *cannot* be stored bit-for-bit -
  >> they must be converted from extended precision to double precision.  It
  >> cannot be otherwise.

It cannot be otherwise because you can't take an 80-bit quantity and store it
bit-for-bit in a 64-bit location.  Even if the original 80-bit quantity is
exactly representable as a double precision quantity, you still can't store
the 80-bit quantity bit-for-bit in a 64-bit location because the width of the
exponents is different.

  >hmm 
  >  1) In the case under discussion, it is not true that the destination
  >     format is x86 style double extended as you imply here.  Actually it is
  >     another type of extended, one with an larger exponent range, but the
  >     same precision as a double.  This is the result of setting the
  >     precision control register to double.

I would be very surprised if any implementation of the x86 architecture had
any user-visible registers with the format you describe.  The setting of the
precision mode does not control the format of a result, only how results of
operations are rounded.

  >  2) It could be otherwise, the rounding precision control field could also
  >     limit the exponent range. Unfortunately, a change of this style may
  >     cause gross behavioral changes in existing code.

OK.  It can be otherwise, but you are interpreting the sentence "It cannot be
otherwise." in a different way than I intended it to be interpreted.

  >> Finally, remember the IEEE Standard does NOT guarantee results will
  >> be bit-for-bit identical on different standard-conforming arithmetic
  >> engines - some variation is built into the Standard (look, for example,
  >> at the rules for how underflow may be detected).
  >ok how about this: If two conformant platforms use the same mechanism for 
  > for underflow detection, then the same bit strings result from the same
  > calculations.
  >Hence, one needs to know only two things for there IEEE 754 code and
  >databases to be platform independent: 
  >  Compliance Requirements:
  >     1. standard compliance
  >     2. underflow detection method (1 of 2 possibilities)

Unfortunately, this is not enough.  See below.

  >If I could identify items which are in the "spirit" of the standard, they
  >would be these:
  >   ...
  >   3. fundamental operations yield the same answers on compliant platforms.

I don't think this is in the spirit of the Standard.  Otherwise, the Standard
would not have given implementors so much leeway.  It's not just that
underflow detection can vary from one implementation to the next.  Consider
the implications of not specifying the exact layouts for single extended and
double extended formats.  In addition, although many implementations provide
access to an operation which adds two single precision floating-point numbers
to produce a correctly rounded single precision sum, I don't see any place in
the standard that says that this operation *must* be provided.  (You can
replace the word "single" with the word "double" in the previous sentence
if you like.)  This means that all of a sudden it is not so easy to know
exactly what operations you can count on a standard-conforming engine to
provide.

  >Also, if #3 is not a criteria, and the x86 intermediate format is
  >ok, then can I not introduce just about every other method of doing
  >arithmetic based on the following:
  >       1) I can supply a function which shows that in some form
  >          the required "result" is there if I look at it right,
  >          its just that I don't use it.
  >       2) I can provide the IEEE required format by conversion,
  >          it just won't be the same bit string as anyone else gets.

I'm not sure exactly what you have in mind here.  Perhaps an example of a
weird method of doing arithmetic would help.

  >We could add another item to the "compliance requirements", of some
  >kind, I'll let you suggest what that would be.

I'd like to work on this.  Give me some time.

  >However, the x86
  >architecture isn't going to let you signal the case occured.  For
  >example, in the example under consideration. When the first rounding
  >is done, the number is not recognized as being tiny because of the
  >expanded exponent range.  Hence the user cannot know that the "error
  >free" part of #3 should apply to a result.

There are ways around this on the x86 architecture.  On an Intel Pentium,
they slow down floating-point performance by a factor of at least 4 for single
precision, and at least 10 for double precision.

  >> I'd be interested in your reaction to this.
  >hope it was interesting :-)

I also hope my reaction was interesting :-)

- Sam Figueroa (figueroaacs.nyu.edu)