measure of dispersion for geometric mean: geometric variance?

Thu Nov 8 10:18:55 PST 1990

I mentioned a couple of days ago that I'd be just as happy with a SPECmark
and a measure of dispersion.  I don't know what the best statistical
measure of dispersion is for use with a geometric mean.  (Geometric mean
is the appropriate one for SPECmark type benchmarks because the
conclusions are unchanged whether you express data in terms of elapsed
times or in terms of rates (mflops or mips); whereas arithmetic means
of times correspond to harmonic means of rates, and vice versa).

The graph of SPECratios for individual programs has great didactic
value and it adapts itself to excellent colorful T-shirts which you may 
observe around Sun building 12 these days.  But these are tedious to incorporate into short ASCII messages.
Anyway if you have a bunch of ratios ri then their arithmetic mean
	am(r) = sum(ri)/n
and their usual variance is
	av(r) = sum((ri-am(r))**2)/n
Notice that geometric mean gm
	gm(r) = prod(ri) ** 1/n
can be expressed
	gm(r) = exp(am(log(r))) = exp( sum(log(ri))/n )
which suggests
	gv(r) = exp(av(log(r))) = exp( sum( (log(ri)-am(log(ri)))**2)/n )
	      = exp ( sum ( log(ri/gm(r)) ** 2 ) / n )
	      = { prod ( (ri/gm(r)) ** log(ri/gm(r)) ) } ** 1/n

The formula for gm(r) can be written without explicit logs but not
this one.

So the question for you statisticians is:  what is the appropriate
analog of variance for geometric mean?  Is there a useful theory
corresponding to that for standard arithmetic variance, such as
" (1 - n**-2) of observations lie within n geometric standard
deviations from the geometric mean? "  I don't recall the question
arising, or an answer, in my long-forgotten studies of statistics.

Lots of statistics suggests lots of data, which suggests there should be lots
of separate benchmarks in SPEC, but I don't particularly endorse that
point of view.  Now that SPEC membership encompasses a diversity
of systems, it would be appropriate to ask for each proposed addition
to SPEC-3 (I presume that SPEC-2 is agreed on by now): does its
performance across all hardware/compilers/options show a 
high degree of correlation
with some linear combination of n existing SPEC programs?   If
so which of the n programs should the prospect replace?  Is it really
better than the proposed replacement? 
(there are lots of aspects of "better" besides statistical diversity).
I think this kind of inquiry is called cluster analysis but I don't
know anything about it.  Naively I'd start with an SVD and try to make
sense of the singular vectors but better methods have probably been
worked out.