optimizations I didn't need: a tutorial of a wasted day
David Hough
sun!Eng!dgh
Thu Jun 28 21:07:16 PDT 1990
The function
STATIC double
read_2_double(data)
long data;
/*
* returns FPA register defined in the reg_2 field of data from command
* format.
*/
{
long *fpa_pointer;
int reg_1;
double reg_1_op, reg_2_op;
reg_1 = data & 0x1f;
reg_1_op = read_double(reg_1);
fpa_pointer = (long *) (FPABASE + 0x884);
*fpa_pointer = data; /* reg_1 gets reg_2 contents */
reg_2_op = read_double(reg_1);
write_double(reg_1, reg_1_op);
return (reg_2_op);
}
doesn't look like a very good candidate for allocating floating-point
variables to registers; no floating-point operations occur. However
just this optimization turns out to have caused a bug in libc FPA
recomputation that's existed since SunOS 4.0 in bundled and unbundled
compilers. The source code is in lib/libc/crt/m68k/fpa_recompute.c
which can be compiled
cc -S -f68881 -DF68881 fpa_recompute.c -On
It works correctly if -On <= -O1, incorrectly if -On >= -O2. The reason is that
reg_1_op gets allocated to a register in -O2:
_read_2_double+8: fmovemx fp7/fp6,a6a(-0x18)
_read_2_double+0xe: movl a6a(8),d7
_read_2_double+0x12: moveq #0x1f,d1
_read_2_double+0x14: andl d1,d7
_read_2_double+0x16: movl d7,spa-
_read_2_double+0x18: jsr _read_double:l
_read_2_double+0x1e: addqw #4,sp
_read_2_double+0x20: movl d1,spa-
_read_2_double+0x22: movl d0,spa-
_read_2_double+0x24: fmoved spa+,fp7
_read_2_double+0x28: movl #-0x1ffff77c,a5
_read_2_double+0x2e: movl a6a(8),a5a
_read_2_double+0x32: movl d7,spa-
_read_2_double+0x34: jsr _read_double:l
_read_2_double+0x3a: addqw #4,sp
_read_2_double+0x3c: movl d1,spa-
_read_2_double+0x3e: movl d0,spa-
_read_2_double+0x40: fmoved spa+,fp6
_read_2_double+0x44: fmoved fp7,spa-
_read_2_double+0x48: movl d7,spa-
_read_2_double+0x4a: jsr _write_double:l
_read_2_double+0x50: lea spa(0xc),a7
_read_2_double+0x54: fmoved fp6,spa-
_read_2_double+0x58: fmovemx a6a(-0x18),fp7/fp6
Saving fp7 and fp6 for this purpose costs extra cycles of course, and adds
little value, particularly in this case when (as it happens)
reg_1_op does not contain a normal double value but happens to
contain a single-precision inf in its upper word and a 0 in its
lower word, which together cause an overflow when loaded into an
fp register in single-precision rounding mode which happens to be in effect
here.
It seems that
using semantic-laden floating-point registers for gratuitous loads and
stores of floating-point variables is a bad idea on Sun-3 and Sun-386;
in this case it causes unexpected exceptions to arise which thwart IEEE
conformance in a very obscure way, under very obscure circumstances which
nobody seems to have encountered in such a way that they recognized
what was happening.
You could argue that this is a poor optimization in any case, even on a Sun-4.
On a RISC the acts of moving data to and from floating-point registers
never generate floating-point exceptions, or more generally, moving
and arithmetic are completely orthogonal.
This was originally going to be a bug report against the optimizer, but
it seems more valuable as a RISC tutorial.
More information about the Numeric-interest
mailing list