optimizations I didn't need: a tutorial of a wasted day

Thu Jun 28 21:07:16 PDT 1990

The function 

STATIC double
read_2_double(data)
        long            data;
/*
 * returns FPA register defined in the reg_2 field of data from command
 * format.
 */
{
        long           *fpa_pointer;
        int             reg_1;
        double          reg_1_op, reg_2_op;

        reg_1 = data & 0x1f;
        reg_1_op = read_double(reg_1);
        fpa_pointer = (long *) (FPABASE + 0x884);
        *fpa_pointer = data;    /* reg_1 gets reg_2 contents */
        reg_2_op = read_double(reg_1);
        write_double(reg_1, reg_1_op);
        return (reg_2_op);
}

doesn't look like a very good candidate for allocating floating-point
variables to registers; no floating-point operations occur.  However
just this optimization turns out to have caused a bug in libc FPA
recomputation that's existed since SunOS 4.0 in bundled and unbundled
compilers.  The source code is in lib/libc/crt/m68k/fpa_recompute.c
which can be compiled

cc -S -f68881 -DF68881 fpa_recompute.c -On

It works correctly if -On <= -O1, incorrectly if -On >= -O2.  The reason is that
reg_1_op gets allocated to a register in -O2:

_read_2_double+8:                       fmovemx fp7/fp6,a6a(-0x18)
_read_2_double+0xe:                     movl    a6a(8),d7
_read_2_double+0x12:                    moveq   #0x1f,d1
_read_2_double+0x14:                    andl    d1,d7
_read_2_double+0x16:                    movl    d7,spa-
_read_2_double+0x18:                    jsr     _read_double:l
_read_2_double+0x1e:                    addqw   #4,sp
_read_2_double+0x20:                    movl    d1,spa-
_read_2_double+0x22:                    movl    d0,spa-
_read_2_double+0x24:                    fmoved  spa+,fp7
_read_2_double+0x28:                    movl    #-0x1ffff77c,a5
_read_2_double+0x2e:                    movl    a6a(8),a5a
_read_2_double+0x32:                    movl    d7,spa-
_read_2_double+0x34:                    jsr     _read_double:l
_read_2_double+0x3a:                    addqw   #4,sp
_read_2_double+0x3c:                    movl    d1,spa-
_read_2_double+0x3e:                    movl    d0,spa-
_read_2_double+0x40:                    fmoved  spa+,fp6
_read_2_double+0x44:                    fmoved  fp7,spa-
_read_2_double+0x48:                    movl    d7,spa-
_read_2_double+0x4a:                    jsr     _write_double:l
_read_2_double+0x50:                    lea     spa(0xc),a7
_read_2_double+0x54:                    fmoved  fp6,spa-
_read_2_double+0x58:                    fmovemx a6a(-0x18),fp7/fp6

Saving fp7 and fp6 for this purpose costs extra cycles of course, and adds
little value, particularly in this case when (as it happens)
reg_1_op does not contain a normal double value but happens to
contain a single-precision inf in its upper word and a 0 in its
lower word, which together cause an overflow when loaded into an
fp register in single-precision rounding mode which happens to be in effect
here.

It seems that
using semantic-laden floating-point registers for gratuitous loads and
stores of floating-point variables is a bad idea on Sun-3 and Sun-386;
in this case it causes unexpected exceptions to arise which thwart IEEE
conformance in a very obscure way, under very obscure circumstances which
nobody seems to have encountered in such a way that they recognized
what was happening.

You could argue that this is a poor optimization in any case, even on a Sun-4.
On a RISC the acts of moving data to and from floating-point registers
never generate floating-point exceptions, or more generally, moving
and arithmetic are completely orthogonal.

This was originally going to be a bug report against the optimizer, but 
it seems more valuable as a RISC tutorial.