Using integer multiplication to avoid integer division

Thu Jun 21 11:35:40 PDT 1990

>  ...
>  unsigned short
>  quorem10000(x, pr)
>  	unsigned long   x;
>  	unsigned short *pr;
>
>  /* quorem gets x/10000 ; *pr gets x%10000              */
>
>  {
>  	long unsigned   t;
>
>  	t = ((x >> 4) * 839) >> 19;
>  	switch (t) {
>  	case 0:
>  		*pr = x;
>  		break;
>  	case 1:
>  		*pr = x - 10000;
>  		break;
>  	case 2:
>  		*pr = x - 20000;
>  		break;
>  	case 3:
>  		*pr = x - 30000;
>  		break;
>  	case 4:
>  		*pr = x - 40000;
>  		break;
>  	case 5:
>  		*pr = x - 50000;
>  		break;
>  	case 6:
>  		*pr = x - 60000;
>  		break;
>  	}
>  	return t;
>  }
>  ...

I assume there's an undocumented assumption that x < 2^16 on input,
since that's a "natural" bound, the switch statement implies it, and
the method itself doesn't always work for inputs much larger than that
even if the switch statement is extended.

If true, you might like one of the following more-transparent routines
instead (which work for inputs 0 <= x < 80000):

unsigned short
alternative1( x, pr )
    unsigned long x;
    unsigned short *pr;
{
    unsigned short q = 0;
    if ( x >= 40000 ) {
	q = 4;
	x -= 40000;
    }
    if ( x >= 20000 ) {
	q += 2;
	x -= 20000;
    }
    if ( x >= 10000 ) {
	q += 1;
	x -= 10000;
    }
    *pr = x;
    return q;
}

unsigned short
alternative2( x, pr )
    unsigned long x;
    unsigned short *pr;
{
    unsigned short q = 0;
    long t;

    /*
     *  warning:  believe the following subtractions are non-standard
     *  because they may assign a "too large" unsigned long value to a
     *  signed long
     */

    t = x - 40000;
    if ( t >= 0 ) {
	q = 4;
	x = t;
    }
    t = x - 20000;
    if ( t >= 0 ) {
	q += 2;
	x = t;
    }
    t = x - 10000;
    if ( t >= 0 ) {
	q += 1;
	x = t;
    }
    *pr = x;
    return q;
}

They're exactly the same "binary search" algorithm, just written in
ways that are likely to work faster under different kinds of
optimizers and architectures.

On whatever the heck I'm running on today (some SPARC system with Sun
4 OS & cc (at least I think so -- cc's -v option doesn't work as
advertised under this thing)) alternative1 runs about 16% faster than
quorem10000, alternative2 runs about 9% faster.  On your system you'll
likely get different results, although I expect that on just about all
systems one of the alternatives will run faster.

Where the timing claims came from:  The division routines were
compiled with -O.  A separately compiled driver called a division
routine with each argument in 0..65535.  The driver was called 100
times.  Gross timings were then obtained from averaging the output of
the Korn shell's "time" command on two runs.

more-fun-getting-something-to-work-fast-than-haggling-over-whether-
   it-should-be-done-at-all<grin>-ly y'rs  - tim

Tim Peters   Kendall Square Research Corp
timaksr.com,  ksr!timaharvard.harvard.edu