64 bit integers -- the real problem

Thu Jan 2 06:36:17 PST 1992

In the discussion about 64 bit integers, a most important point has gone
undiscussed.  Recall that a data structure is a data type and operations on
that type.  An assumed part of that definition is that those operations be
"useful!"  Most of the proposals floated have failed this "usefulness" test.

Short form:  Whatever else NCEG does or does not do w.r.t. 64 bit integer
types, NCEG really should propose a reasonable model for integer arithmetic
semantics to repair the brain damage in the ANSI C standard.

Long form:

I start with two theses.  First, a programming language is "good" to the
degree that it makes it easy to write correct programs.  A good programming 
language does not have features that seduce all but the wariest programmers 
into writing programs that don't work.  Programs that have an obvious naive
interpretation should be interpreted in the naive and obvious way.

Second, the most useful behavior for computer integer arithmetic is to model
pure algebraic integers as closely as possible.  Just as mathematicians
arrived at our current notions of integers over several millenia because they
usefully model the real world, a computer arithmetic usefully solves real
problems to the degree that it models mathematical integers.  But, of course,
the essential problem in defining computer arithmetic systems is that
computers are constrained to modelinfinite number systems within a costly,
finite, physical implementation.

A model for correct evaluation of integer expressions.

A C compiler stands on solid ground when it exploits a programmer's assertion
that the values that will be stored in an explictly-typed lvalue won't exceed
the bounds of its type.  Once a programmers has, for instance, declared a
signed 16-bit variable, he promises not to set it to a value outside of
[ -32768, 32767 ].  And, of course, most processors physically prevent it.

It just happens that 2's complement implementations make modular arithmetic
behavior most convenient for the overflow cases.  Almost always when the
difference between unbounded integers and modular integers is exposed, it is
at best an irritating behavior; it should not be the central model presented
to the programmer, only the consistent fallback when no non-irritating
alternative exists.  The goal should be to do the arithmetic as if on the
unbounded integers of mathematics, and lose bits only at the points where the
programmer has provided explicit size assertions, guaranteeing that no useful
bits will be discarded.

*** CENTRAL PARAGRAPH ***  HEREINAFTER REFERED TO AS "***" ***

We thus come to a utopian characterization of "correct" integer arithmetic:
a correct rendering of an integer assignment statement is indistinguishable
from one that extends all right-hand side (RHS) operands to infinite width
( either zero-extending or sign-extending as appropriate ), evaluates the
RHS in infinite width, then deposits the low-order bits into the left-hand 
side (LHS) variable.  ( "assignment" includes evaluation of the actual 
parameters of a function call, evaluating an expression that is the operand 
of a type cast, or any other action that computes a value into a cell of a
programmer-defined type. )

Interesting result of ***  -- utopia isn't expensive in most cases.

To "correctly" evaluate assignments whose RHS' use only the right-to-left
operators add, subtract, multiply, and, or, xor, left-shift, negate, and
not operators, it is necessary and sufficient to extend all operands to at
least the width of the LHS, do the computation in that width, and deposit
the low-order bits into the LHS.  The high-order bits may be ignored or
manipulated with arbitrary malice!

Division ( and its cousins mod and right-shift ) are essentially the entire
problem -- either the language design center must be a scheme that makes
them work, or the it must explictly decide where they break.  Division ( I
use D to mean D, M, and >> ) has an irritating property:  the value of any
bit of either the dividend or divisor can affect the value of any bit of the
result.  Thus, most current architectures -- and the C language -- implement
both a signed divide ( which implicitly sign-extends both operands to infinite
width before dividing them ), and an unsigned divide ( which implicitly
zero-extends both operands ).

So, now let's take a look at the ingredients we need to mix together to
cook a useful integer data structure.  First, the data types.  Most C compilers
implement the following kinds of integers, though one or more of these classes 
may be empty on a particular implementation:

   * A "native" integer, "int".  Typically this is the width of the
     integer registers in the smallest member of the target
     architecture's family.

   * "Narrow" integers.  "short" and "char" in most implementations,
     these are integer formats that are narrower than the "native"
     format.  If a narrow operand is consumed by a divide, either the 
     architecture contains instructions operating on these narrower
     operands, or the operands need to be extended to the width of
     native integer.

   * Some C implementations include "convenient wide" integers --
     "long" or "long long" that is wider than "int".  Typically this
     is the width of the integer registers in the largest member of
     an architecture family, or the width of the operands of an
     instruction that operates on multiple-register operands.

   * "Bignums".  These are much wider than the machine's registers.
     Bignums may either be unbounded LISP-like bignums, or may be
     statically-sized multiple-word operands.  Static-size bignums
     can be implemented in multiple registers.  On register-poor
     machines, or for unbounded values, bignums must be implemented
     completely in memory using add-with-carry instructions, long
     multiplication and division, etc.

[ I absolutely don't care what the types are called.  If we do the following
correctly, and the program uses ANSI prototypes instead of allowing types to
default, essentially everything else can be solved with typedefs -- enough
just works that I think essentailly all my customers will be happy.  Without
the following, no nomenclature for types will be portable. ]

Now the interesting part of our data structure, the operations:  

Unfortunately, language designers and implementors are forced to trade off
the correctness of mathematical integers against the cost of multi-instruction
sequences.  There are four interesting points on the correctness vs. execution
time spectrum:

   * Scheme 1.  Fully correct arithmetic at unbounded cost.
     The most correct integer arithmetic scheme is an unbounded bignum-based
     implementation where the width of operands is determined dynamically.
     This implements full mathematically correct unbounded-value integer
     arithmetic.  This is appropriate in some languages, not in C.

   * Scheme 2.  Fully correct arithmetic at modest cost.
     Scheme 1 ignores the hints given by the programmer in the declarations:
     the programmer promises to the compiler that the values of variables are
     bounded as specified by their types.  In scheme two, the compiler exploits
     this bounds information to do all arithmetic in the minimum width required
     to fully meet the correctness criterion of ***, potentially using
     statically-sized bignums for intermediate values.    

     Thus, a RHS not containing divides can be evaluated in the width of the
     LHS ( a property shared by schemes 2, 3, and 4 ).  If a RHS does contain
     a divide, all RHS intermediate results that eventually are used by the
     divide must be computed to enough bits to contain the widest possible
     range of those results, even if this requires doing this intermediate
     arithmetic in an "inconveniently wide" static bignum format.

   * Scheme 3:  A reasonable approximation to fully correct arithmetic at
     almost no cost over incorrect arithmetic.

     In the third scheme, all intermediates are the width of the widest atom
     in the statement, or the native integer format if that is wider.  Thus,
     width information must be synthesized bottom-up and inherited top-down.
     Thus, if the LHS is the only wide integer, all RHS atoms are promoted to
     the width of the LHS before the arithmetic is performed.

   * Scheme 4:  Really Stupid arithmetic.

     Though several language standards discuss multiple representations of
     integers, most do not specify explicit promotion rules, specifying only
     general arithmetic rules that lead implementors to something close to
     scheme 3.  ANSI C, however, specifies explicit promotion rules that
     emphasize execution speed over any pretense of correctness:  the width
     of the intermediates is only a bottom-up property -- the width of the
     widest constituent operand, or "int" format if that is wider. Thus,
     this assignment
        signed-64-bit-int-a = signed-32-bit-int-a * signed-32-bit-int-b;
     if the native integer is 32 bits, will be evaluated in 32 bits, then 
     sign-extended to the width of the LHS.  Thus, the high-order bits of the
     LHS are all but guaranteed to be incorrect.  

     Under this scheme, all programmers using integers wider than "int" must
     examine their programs *very carefully* *by hand* to ensure that the 
     correct explicit widening casts have been used.

The reason that the masses haven't risen up and burned down most vendors' R&D
facilities is that most of us have implementations where "int" is the widest
format; thus the problem doesn't arise.  But because we haven't heretofore seen
it doesn't mean it isn't a problem; ask PC programmers.  Several of the proposals
that have been floated in this forum would expose it.

The cost differential between scheme 3 and scheme 4:
   * All arithmetic is done in a width "convenient" for the hardware
     ( assuming that the language implementation does not define a
     primitive data type that the underlying hardware finds totally
     "inconvenient" ).

   * The only time wider-than-native arithmetic is required is when
     the programmer has specified a wider-than-native operand or
     result.  The program is explicitly telling the compiler that a
     wider-than-native ( though still "convenient" ) operation is
     required to do the arithmetic correctly.

   * Assuming that "convenient wide" operations execute at the same
     speed as "native width" instructions, the only additional cost
     of scheme 3 over scheme 4 is when the LHS is wider than native,
     and then each native-or-narrower operand must be pre-extended to
     wide format, rather then the entire RHS being post-extended.

Summation:  There are two possibilities:
   1) If NCEG is unwilling to unbreak the ANSI standard, then NCEG should
      recommend that all implementations define "int" to be the widest
      integer datatype in the machine.  Congratulations, Cray.
   2) My preference: NCEG should recommend something like scheme three to 
      replace ANSI's scheme four.

-boundless