Notes from BLAS-3 lecture by Van Loan

Sun Jul 22 19:11:55 PDT 1990

>  Some of you will remember the Strassen theory for multiplying
>  matrices in less than n**3 time, based on divide and conquer and
>  doing 2x2 with 7 instead of 8 multiplications.  This has been
>  generally regarded as only theoretically interesting since the
>  asymptotic superiority only shows up for very large dimensions and
>  the numerical stability of the process seems questionable.  According
>  to Van Loan, David Bailey coded up Strassen-style algorithms for
>  Crays and found them to be faster than the usual ones for problems of
>  interesting size and of overall comparable numerical stability.

People were getting good speedups via Strassen's trick on Crays more
than two years ago.  Last I saw, the theoreticians had pushed the
asymptotic complexity down to O(N**2.51) (with different chunkings than
Strassen used; his method is O(N**2.81)), but haven't heard of anyone
actually implementing the fancier methods for practical work ...

>  Van Loan mentioned that a German investigator had written a matrix
>  multiplication thought to be optimal for some system - on the order of
>  100,000 lines of assembler code.  Given product lifetimes of two years
>  or so these days I have to question the value of that kind of effort.
>  It seems more fruitful to put that kind of effort into better compilers
>  and better languages that allow you to portably specify what the
>  compilers need to know to generate efficient code.

If the effort allowed them to run a problem they couldn't have run
without it, the value of the effort is the value of running the problem
-- which may be arbitrarily high.  It's good to question the economics,
the problem is that all efforts at crafting better compilers have fallen
way short of the mark for many nasty architectures, and all efforts at
designing better languages have left Fortran still the language of
choice <grin>.

>  What I currently favor is a requirement to provide identical results
>  on all IEEE 754 implementations by default, with the expectation that
>  each implementation will also provide an optional faster mode; the
>  relative speedup will vary quite a bit among systems and programs.

Doesn't this imply that NCEG has to mandate "the result of every
operation must be rounded to 754 double before continuing"?  I.e., seems
you have to disallow any/all "extended precision" gimmicks by default,
else results won't be identical by default.  Fine by me, just don't
recall the NCEG proposal saying that.

Would also help to specify syntax to get at the "optional faster mode",
else the proposal creates new portability problems (people can believe
this or not as they choose, but I'm certain that 99% of the customers
I'll see will always want the "fast mode", and I don't want to tell them
they have to junk up their program with KSR-specific pragmas to get at
it).

>  But in time hardware designers will catch on and figure out ways to
>  get the standard results sufficiently fast that there won't be any
>  need to have the fast mode be any different.

I agree the hardware folks can speed up anything they really believe
they have to speed up <grin>, but I think too that "the standard
results" are a moving target.  E.g., every msg I've seen on this list
that addressed the issue was clearly in favor of using (non-standardized
and non-portable) extended-precision gimmicks, even in violation of
language stds, and if that's what people really want then today's
"standard results" are tomorrow's progress-inhibiting dinosaurs.  And I
wouldn't be at all surprised to see the IBM RIOS's one-rounding
multiply-and-add triad become, in time, a strong candidate for a
"standard behavior" that will make the extended-precision gimmicks look
like progress-inhibiting dinosaurs.  There's no end in sight.  Today's
"std results" are an arbitrary point frozen in time -- so probably a
mistake to make them the default.

although-it-would-be-handy-to-have-an-optional-"deliver-std-results-as-
   defined-by-the-1990-incarnation-of-IEEE-754"-mode-ly y'rs  - tim

Tim Peters   Kendall Square Research Corp
timaksr.com,  ksr!timaharvard.harvard.edu