fast integer division

Fri Jun 22 12:38:18 PDT 1990

>  [dgh]
>  ... in converting between base 10**4 and base 2**16, the long
>  unsigned dividend can be almost as large as (2**16)*(10**4), and so
>  none of the methods ... suggested are applicable.

So switch to base 2**3 <grin>.

>  Conditionals are bad news, so putting in a switch statement in my
>  code or a binary search in Tim's may be an ephemeral advantage.  You
>  can make multiplies faster with a bigger hardware array but
>  conditional branches are just getting (relatively) slower.

Ya, that's the problem in writing "efficient" portable code -- so much
depends on the architecture!  The KSR machine is one of the few I know
of where your multiply-and-shift-plus-switch stab runs faster than my
three-tests stab (although it would run faster still if it computed the
remainder by multiplying and subtracting).  Strangely enough, the
fastest way to solve the full problem on a Cray would be to use
floating-point arithmetic (while it's not obvious, for dividends less
than (2**16)*(10**4) there's a fast test-free branch-free way to fiddle
things so that Cray's unreliable floating divide nevertheless delivers a
perfect integer quotient on the first try).  Etc.

>  Earl Killian pointed out that there's an interesting discussion of
>  this stuff in section 4.4 in volume 2 of Knuth, especially exercise
>  9.

I can't find a real citation just now, but especially look at "Integer
Multiplication & Division on the HP Precision Architecture", an article
that appeared "somewhere" in the last couple years.  This develops some
interesting methods for integer division on a machine that can't
directly divide or multiply worth beans (no offense to any HP people
reading this <grin>).  Given a fixed integer divisor y, and a bound on
the possible dividends, they precompute integer constants (depending on
y and the dividend bound) r, b and m such that

floor(x/y) is (r*x+b) >> m

for all x within the bounds (warning:  their notation probably differs).

E.g., the method you sent out is a special case of this with y=10000,
bound on dividends around 2^16, r=839, b=0, and m=19 (and that 839 =
ceiling(2^19/(10000/16)) is a good starting clue as to why it works).

Alas, finding appropriate r, b, and m can be tricky, and m sometimes
turns out to be so large as to require faking triple-precision
operations.  A related method KSR uses avoids most of this complexity at
the cost of one test; however, we have the biggest multiplier array in
the industry <grin>, so it may be less suitable for other machines:

Problem:  Given fixed unsigned integers y and n, 2 <= y < 2^n, find a
cheap way to compute floor(x/y) for all integer x s.t. 0 <= x < 2^n.

A solution:

1) Precompute the integer r = floor(2^n/y).

2) The algorithm is then

   qhat := floor(x*r/2^n)
   ASSERT qhat IN {floor(x/y), floor(x/y)-1}

   IF x-qhat*y >= y THEN
      qhat := qhat + 1
   ENDIF
   ASSERT floor(x/y) = qhat

Possible subtleties:

1) Note that in the precomputation of r, the dividend (2^n) is an n+1
   bit number; n-bit division doesn't suffice (at least not directly)
   for this step.

2) floor(x*r/2^n) is simply the upper n bits of the 2n-bit product x*r;
   depending on n and your machine, you may be able to get that with one
   instruction or may need to fake it as a multiple-precision operation.

3) The ordering of the operations in "x-qhat*y >= y" is guaranteed to
   allow all the arithmetic to be done using unsigned n-bit arithmetic;
   if you rearrange it, and intend to use n-bit arithmetic, make sure
   you don't introduce an opportunity for n-bit overflow (e.g., do *not*
   rewrite it as "IF x >= y*(qhat+1)").

Notes:

1) For some special values of y it's possible to do better; e.g., if y
   is a power of 2 it's obviously possible to do better, but there are
   cases beyond that ...

2) If you want the remainder too, replace the IF block by

   rem := x - qhat*y
   IF rem >= y THEN
      qhat := qhat + 1
      rem  := rem - y
   ENDIF
   ASSERT floor(x/y) = qhat
   ASSERT 0 <= rem < y
   ASSERT x = qhat*y + rem

Probably little here that can't be deduced easily as an extension of
Knuth's proofs (see David's reference above), so I'll skip the proofs.

don't-have-time-to-type-'em-in-anyway<smile>-ly y'rs  - tim

Tim Peters   Kendall Square Research Corp
timaksr.com,  ksr!timaharvard.harvard.edu