Anti-decay operator

Dave Becker uunet!juniper.cray.com!becker
Mon Dec 14 13:57:51 PST 1992


Ron,

I have some comments on your proposal to add first class arrays
to C, but first I thought I should give you some information on what
the NCEG array syntax subgroup is doing.  The base document is the
C* reference manual.  Primary emphasis has been on coming up with group 
consensus on a dataparallel dialect of C.  I have been one of the few
decenting voices in the group.  I must say that I agree with what you
said early in your proposal:

> First let me clarify that what I have gathered from the old NCEG postings
> I have read is that some NCEG people may want to add all sorts of whiz-bang,
> grandious, and inefficient array manipulation features to C, along the
> lines of (for example) Fortran do-lists and Ada array slicing, but I'm
> not in the least bit interested in things like that.  I feel such things
> are far too baroque for inclusion into an otherwise clean, simple, and
> efficient language like C.  I'd like to just take what's there and tweek
> it a bit while trying hard to maintain compatability with the existing
> syntactic and semantic rules of ANSI C.  In my experience, big and
> complex language changes often fail because they are just too damn
> difficult to specify, too difficult to fully implement, or too difficult
> to integrate into the language without cause nasty incompatabilities.
> (C++ is a perfect example of all three of these pitfalls.)

I must say that your proposal is the cleanest attempt I have
seen thus far for making arrays first class (or more first
class) in C.  The problem is that once first class arrays are
added to C, you are forced to add a lot of other things.  Here
is a summary of what always seems to happen when first class
arrays are added to C:

   1) New syntax is added to create "first class" arrays.
      This is required because traditional C style arrays
      are still needed.
   2) Operator overloading or creation of new operators/intrinsics
      are needed for operands of "first class" array type.
   3) Shape conformability is a concern and requires more syntax.
   4) Function interface becomes very complicated.
   5) Control flow complexities are introduced.
   6) Non-elemental operations require either additional syntax,
      new operators, or intrinsics.

I would like to now walk through your proposal and show you how
the above six points do occur in your proposal.

> So that's what I decided would be the best thing.  Add a new postfix
> operator `[]' which, when applied to a preceeding operand which (nominally)
> has some array type, simply prevents the type from decaying into a
> pointer type.  This is simple to understand, explain, and implement,
> and it doesn't conflict in any way with any of the existing syntax or
> semantics of either C or C++.  (Note that the rules for this new operator
> would require that the operand be of some array type.)

So here is point #1 - adding new syntax for getting "first class" arrays.

> First, what do we do about comparison operators?  What does the expression:
> 
> 			left[] <= right[]
> 
> really mean?

> 
> Naturally, some users would prefer to have comparison operators work one
> way, and other users would prefer to have them work the other way.  As long
> as we only have one set of comparison operators, we cannot accomodate both.
> 

Here enters point #2 - operator overloading/new operators.  I believe you
have just hit the tip of the ice berg with this one.  Basically you have
to go through all of the C operators and decided on what the meaning
of these operators are with one or two first class array operands.
What does "x += vec[]", "!vec[]",  or "vec[] = 2" mean?  Just as your
analysis of comparison operators demonstrates, there are no simple rules
that you can apply here.  In some cases you want both vector and scalar
results for the same operator.  So either new operators/intrinsics are
introduced or the rules get very complicated.

> The answer here seems obvious (to me anyway).  If we have some binary operator
> for which one operand is an array of known length and the other is an array
> of some unknown length, we "coerce" the type of the array of unknown length
> so that its type is the same as the other operand (which has a known length).
> (This rule would also apply to the second and third operands of the ?:
> operator.)
> 
> But wait!  What about unary operators applied to array of unknown length?
> Also, what about binary operators where both operands are of some unknown
> length?

> 	void vector_add (int vec1[], int vec2[], unsigned len)
> 	{
> 		vec1[] += (int[len]) vec2[];
> 	}
> 
> To make this work, some special "type compatability" rules would have to
> apply to values which have some "dynamic array type".  Specifically, we
> would want to stipulate that (for all applicable binary operations) a
> value of some dynamic array type is *only* compatible with a value of
> some *incomplete* array type which has the same element type.

Welcome to point #3, shape compatibility.  Once first class arrays are added,
you have to ask what is the behavior of operations that are not shape
compatible?  Although you can make some default shape compatibility promotion
rules, at some point you need to add syntax similar to your cast technique.
Another option is to add intrinsics like the Fortran-90 reshaping intrinsics.

> Well, the more asture readers will have noticed that I glossed over a rather
> important point in my examples above.  Specifically, given a function like:
> 
> 	void vector_add (int vec1[], int vec2[], unsigned len)
> 	{
> 		/* ... */
> 	}
> 
> the ANSI C standard says that this function is really equivalent to:
> 
> 	void vector_add (int *vec1, int *vec2, unsigned len)
> 	{
> 		/* ... */
> 	}
> 
> ect.

It is becoming more apparent to me that issue #4 (function interface
complexity) is one of the more difficult issues to solve when adding
first class arrays to C.  It is really a trade off between ease of
programming and execution efficency.  Take for example "sin(vect[])".
What does this mean?  Does the behavior change depending upon whether a
prototype of sin is present which expects a vector?  Can a user write a
function that will accept an argument of any shape?  How can the compiler
generate efficient code for a function that takes an argument of any shape?
And what is of intense interest for me recently - how does the complexity of
distributed memory impact all this?  There is no way around it - either
function interface has to change significantly or the user's hands are
tied in what they can express.

Issue #5 (control flow complexity) was not cover in your discussion of your
proposal. What is the behavior of "if (vec1[] <= vect2[])" or even better yet
what is the behavior of "while (vect1[] <= vect2[])"?  I am assuming
that "<=" results is a vector of values.  It has been very interesting for
me to see how the different dialects of C* have answered this question in so
many different ways.

> The idea might also extend nicely from vectors to matricies, if we think
> of those as undergoing (in ANSI C) multiple stages of "type decay" (as I
> believe they do).  Then we could write:
> 
> 		int sink[100][100];
> 		int source[100][100];
> 
> 		void matrix_multiply ()
> 		{
> 			sink[][] *= source[][];
> 		}
> 
> (Note that this is definitely *not* the kind of matrix multiply that those
> in the numerical computing community would like to have.)

Last but not least, here is a reference to point #6 - non-elemental operations
are difficult.  With your array syntax proposal, a user can not write a
general matrix multiply or a transpose.  This is true with any "first class
array proposal" that I have seen.  A linear algebra programmer is forced to
use "for" loops in all cases except for a few trivial operations.  An
alternative is to do what Fortran-90 did and define a boat-load of intrinsics
for common linear algebra operations.  Or like C* - left subscripting can be
added to do non-elemental operations.  With any of these solutions, things
get really ugly once you attempt to express operations on arrays beyond rank 2.

If you are wondering how matrix multiply and transpose are specified with
Cray Research's iterators:

	int a[100][100], b[100][100], c[100][100];
	void test() {
	   iter I = 100, J = 100, K = 100;

	   a[I][J] = b[J][I];	/* transpose */
	   a[I][J] += b[I][K] * c[K][J]; /* matrix multiply */
        }

So in conclusion, I believe your proposal will end out being as large of a
proposal as C* (or at least the same order of magnitude).  I believe that
any attempt to add first class arrays to C will result in an unacceptably
large change to the language.  Early in '92, a group of us at Cray Research
attempted to add first class arrays to C and found that the extension was
becoming huge.  The extension was getting so large that we really couldn't
call the language "C" any more.  That is when we started looking at iterators.

I am of course very biased towards iterators, but I believe the iterator
proposal is small enough that it has a good chance of making it into C.
We are finalizing revision #3 of the proposal this week.  Please let me
know if you would like a copy.  It will be a part of the next NCEG mailing.

Dave Becker (beckeracray.com)



More information about the Numeric-interest mailing list