Another proposal for dealing with arrays in C
Bob Knighten
uunet!SSD.intel.com!knighten
Tue Mar 24 09:50:53 PST 1992
There are a number of efforts underway to define variants of C. This usually
means doing something more about arrays. Here is one proposal from a Russian
group for an extension to C that was sent to Walt Rudd because of his role in
the X3H5 effort.
Dear Prof. Walter G. Rudd,
We are participants of implementation of programming
languages (the Assembly language, Fortran 77, Pascal and C)
for one of the first Russian supercomputers - "Electronica
SSBIS". The work was started in 1985. All compilers are
already working on simulator of the supercomputer on
instrumental computers, but hardware yet is not ready.
Answering the request of our software engineers (they were
developing the Operating system and Scientific subroutine
packages) we have developed the extension of C language for
our supercomputer. One of our aims in developing the extension
was to achieve the software portability among different
supercomputers.
We send you the paper describing the main features of our
C language extension and hope that it may be useful for your
working group X3H5.
We are interested in discussion on C language extensions
for computers with parallel architecture and should be very
grateful to you for our further contacts.
Sincerely Yours,
Sergey S. Gaissaryan
Alexey L. Lastovetsky
Our address is:
Dr. S.S.Gaissaryan
37 Vavilova street
Cybernetics Problems' Institute
Russian Academy of Sciences
Moscow, 117312,
Russia
e-mail kuzaivann.delta.msk.su.
Fax 7(095)9382136.
EXTENSION OF C FOR SUPERCOMPUTERS
S.S.Gaissaryan, A.L.Lastovetsky
1. Introduction.
In this paper we consider the language aspects of the
problem of software transfer within the class of
supercomputers having shared storage.
The problem of program transfer from one computing
environment to another consists in keeping of the functional
semantics and the operational properties of this program.
The first step toward solution of the transfer problem
is the development of#_ #.transfer language#_ #.for desired class of
computers.
The transfer language is a programming language which
satisfies the following demands:
- the system of language notions includes all basic
notions of assembly languages for the corresponding class of
computers;
- language's semantics must be defined with maximal
possible degree of completeness, allowed by considered class
of computers;
- the facilities to localize the dependence of program
semantics on the architecture must be provided;
- the language description must specify clearly what
exactly in the language does not depend on the implementation,
what and how exactly depends on it, and what is undefined.
The transfer language effectiveness is ensured on the one
hand by including all basic notions of assembler languages for
the corresponding class of computers in the system of language
notions, and on the other hand by describing these notions
such that its semantics is fuzzy enough to provide the
effective mapping in the system of notions of every computer
of this class.
The desired degree of language portability is ensured,
first of all, by the highest possible degree of completeness
of its functional semantics which affects its effectiveness.
For the class which consists of computers of traditional
von Neumann architecture the C language as specified by [1] is
used as transfer language.
In this paper we describe the C[] language proposed as
the transfer language for the supercomputers having shared
storage. The s[] language is the exact extension of C. It
means that any correct C-program has the same semantics in C[]
and C languages.
In C[] the conceptions of array and pointer are extended
in comparison with C. The concept of vector is introduced. The
new [], [:] operators are included in C[]. Once more
storage-class cache is added. The standard libraries are also
extended. Besides, it is allowable in C[] to describe a member
of a structure as an array of bit-fields.
2. Array.
In the C language an array comprises "a contiguously
allocated set of members of any one type of object"[1,p.21].
In C[] language an array comprises a sequentially
allocated (with a constant positive, negative or zero step)
set of members of any one type of object. A negative step
specifies the allocation of members of the array in the
reverse (from right to left) order. The zero step specifies
the allocation of all members of the array in the same element
of storage.
Thus in the C[] language an array has at least 3
attributes, namely: a type of members, a number of members and
an allocation step.
In the C[] language the array declarators syntax differs
from standard one (see [1,p.56]) in following way. The rule
<direct-declarator>:
<direct-declarator>[<constant-expression>(opt)]
is replaced with the rules
<direct-declarator>:
<direct-declarator>[<constant-expression>(opt)
<step>(opt)]
<step>:
:<constant-expression>
If <step> is not specified then it is equal to 1.
3. Pointer.
In the C language a pointer has only one attribute,
namely: the type of objects pointed to.
This attribute is necessary, firstly, to interpretate
correctly values of objects in case of an access to objects by
means of the pointer, and secondly, to interpretate correctly
the address + and - operators.
In the C language the address + and - operators are
correct only if the pointer operands and the pointer results
point to members of the same array object. The same rule is
valid for the C[] language. Therefore, to support the correct
interpretation of the address operators once more attribute of
the pointer is introduced in the C[] language. This attribute
is step.
In the C language, "when an expression that has integral
type is added to or substructed from a pointer, the integral
value is first multiplied by the size of the object pointed
to"[1,p.41]. In the C[] language, the similar multiplier is
equal to the product of the pointer step and the size of the
object pointed to. In the C language, "when two pointers to
members of the same array object are substracted, the
difference is divided by the size of a member"[1,p.41]. In the
C[] language, the similar divisor is equal to the product of
the pointer step and the size of a member.
In the C[] language the pointer declarators syntax
differs from standard one (see [1,p.56]) in following way. The
rules
<pointer>:
*<type-specifier-list>(opt)
*<type-specifier-list>(opt)<pointer>
is replaced with the rules
<pointer>:
*<step>(opt)<type-specifier-list>(opt)
*<step>(opt)<type-specifier-list>(opt)<pointer>
If <step> is not specified then it is equal to 1.
Example. The declarations
int a[]={0,1,2,3,4};
int *:2 p1=(void*)a, *:-1 p2=(void*)&a[4];
result the address expressions (R1+1) and (p2+2) point to the
same member of array, namely, a[2]. Indeed, the offset from p1
defined by the expression (p1+1) is equal to (2*sizeof(int))*1
bytes, and the offset from p2 defined by the expression (p2+2)
is equal to ((-1)*sizeof(int))*2 bytes.
4. Access to member of array.
In the C[] language the access to e2-th member of an
array object e1 is fulfiled with the help of the expression
e1[e2] which is identical to (*(e1+(e2))). Here, e2 is an
integral expression, e1 is an lvalue that has the type array
of type, and this lvalue is converted to an expression that
has the type pointer to type and that points to the initial
member of the array object (the attribute step of this pointer
is identical to the attribute step of the array object).
5. Vector as value of array object. Access to value of
array.
In the C[] language a value of an array object is defined
as an ordered sequence of values of its members and is called
vector. The i-th member of an vector is a value of the i-th
member of the corresponding array object.
It follows from this definition that an array object can
not be a member of a vector (since any array object is not a
value of an object of some type), but one vector can be a
member of another vector (since a vector is a value of an
array object).
Note, that in contrast to the C language in C[] language
the concepts type of object and type of value of object are
different. In particular, a vector is a value of array object
but it has not the attribute step. This attribute specifies an
allocation of an array value in an array object. A type of a
vector is defined by only two attributes, namely: type of
members and number of members. Therefore, two array objects
that have a different types may have a values of the same
type.
No access to a member of a vector is defined.
Example. The value of the array defined by the
declaration
int a[3][2];
is the vector consisting from three vectors, each from which
consists from two integers. So, the execution of the iteration
statement
for(i=0; i<3; i++)
for(j=0; j<2; j++)
a[i][j]=i+j;
causes the value of the array a to be equal to the vector
{ {0,1}, {1,2}, {2,3} }. The type of this vector is named by
the type name int[3][2].
In the C language, except when used as an operand where
an lvalue is permitted (in particular, as an operand of the
sizeof operator), or when a string literal is used to
initialize an array of chars, an expression that has type
array of type is converted to an expression that has type
pointer to type and that points to the initial member of the
array object.
To support the access to an array as the whole the
postfix [] operator is introduced. The operand of this
operator shall have type array of type. The [] operator blocks
(forbids) the convertion of the operand to a pointer. Such
lvalues that designate an array as the whole are called a
blocked lvalues. A rules of the treatment with a blocked
lvalues are fully similar to a rules of the treatment with a
lvalues that have a scalar type. In particular, if a is an
array then the expression &a[] is equal to the lvalue a.
A modifiable blocked lvalue#_ #.is a blocked lvalue that does
not have a type declared with the const type specifier.
The unary & operator deblocks a blocked lvalue.
Example. If the arrays a, b and c are declared as
int a[8], b[8:2], c[8:-1];
then the expression a[]=b[]+c[] assigns the sum of vectors
being a values of the arrays b and c to the array a. Note that
here the expression a[]+b[] has type vector from 8 ints.
6. Access to subarrays.
Any set of objects belonging to some array is called its
subarray iff this set itself forms an array. By definition an
object belongs to an array if either it is a member of this
array or it belongs to a member of this array. Besides, any
subarray is also an object belonging to the array.
In principle, the introduced facilities are sufficient to
access to a subarray. For example, if the array object a is
defined by the declaration
int a[5][5];
then the blocked lvalue
(*(int(*)[5:6])a)[] (1)
designates the array of five ints (with the step 6) that
contains the main diagonal of the matrix a, and the blocked
lvalues
(*(int(*)[4:6])(a[0]+1))[] (2)
and
(*(int(*)[4:6])(&a[0][1]))[] (3)
designates the array of four ints (with the step 6) that
contains the diagonal of the matrix a which is placed under
the main diagonal.
The more compact notation comes out if a variables of
type pointer to array are used. So, if the pointer objects p1
and p2 are defined by the declaration
int (*p1)[5:6]=(void*)a, (*p2)[4:6]=(void*)(a[0]+1);
then the expression (*p1)[] can be used instead of (1) and the
expression (*p2)[] can be used instead of (2) and (3).
In the C language the array subscripting operator is
surplus. This operator provides a comfortable abbreviation
e1[e2] for the expression (*(e1+(e2))) where the expression e1
shall have type pointer to type and the expression e2 shall
have integral type. Similarly, the ternary [:] operator is
introduced in the C[] language. This operator provides an
abbreviation for the blocked lvalues like (1)-(3). Namely, if
e1 is a lvalue by type t and e2, e3 are an integral constant
expressions then the expression e1[e2:e3] is equal to the
expression (*(T(*)[e2:e3])(&(e1)))[] being within the scope of
the type definition of the typedef t T kind. The expression e3
is optional (the expression e1[e2:] is equal to the expression
e1[e2:1]).
For example, the expression (1) can be expressed as
a[0][0][5:6], and the expressions (2) and (3) can be expressed
as a[0][1][4:6].
In common case, both second and third operands of the [:]
operator can be any integral expressions (not only constant
ones). It allows to specify dynamically the required subarray.
7. Segments of arrays.
Not every regular set of objects belonging to an array is
a subarray.
To make the access to the similar sets of objects
belonging to an array easy, new derived types segments of
arrays#_ #.are introduced in the C[] language and the [:] operator
is extended.
A segment of an array may comprise a sequential members
of the array allocated with a constant step (a unit of
stepsize is a member of the array). Besides, if a members of
the array are also an arrays then a segment of the array may
comprise a sequential segments (of the same type) of the array
members allocated with a constant step. In this last case, a
unit of stepsize may be not only a segment of a member of the
array but a member of a segment of a member of the array if a
member of a segment of a member of the array is also an array
and so on.
In common case, in the expression e1[e2:e3] the third
operand that specified a step is described as follows
<segment_step>:
<integral_expression>(opt)
:<segment_step>(opt).
The expression e1 is either an lvalue whose type is not array
or a blocked lvalue of type array or an lvalue of type segment
of array. If e3 is an integral expression then a unit of step
is equal to the size of the object designated by e1. If e3 is
:e4 where e4 is an integral expression then a unit of step is
a member of the object designated by e1. If e3 is ::e4 then a
unit of step is a member of a member of the object designated
by e1 (naturally, in this case e1 should be an lvalue of the
corresponding type).
If e3 is :e4 then the expression e1[e2:e3] is an lvalue
of type segment of array and designates the corresponding
segment. If e1 is an lvalue of type segment of array then the
expression e1[e2:e3] is also an lvalue of type segment of
array and designates the corresponding segment. In all rest
cases the expression e1[e2:e3] is a blocked lvalue of type
array.
An lvalue of type segment of array is never converted to
a pointer that points to the initial member of the segment and
always designates the segment as the whole.
There are some constraints in a usage of lvalues that
have type segment of array. In particular, the & unary
operator is not applicable to such lvalues.
If e1 is an lvalue of type segment of array and e2 is an
integral expression then the expression e1[e2] designates the
e2-th member of the segment e1 (counting from zero). In
difference from the array subscripting operator the segment
subscripting operator is not expressed by the * and + address
operators that is in this case the expression e1[e2] is not
identical to the expression (*(e1+(e2))).
8. Vector construction operators.
If e1,...,eN are expressions of type t then the
expression {e1,...,eN} is of type vector of N members of type
t.
If vector or scalar expressions e1, e3 are of type t and
e2 is integral positive expression then the expression
e1[e2#e3] is of type vector of e2 members of type t. The first
member of this vector is e1, the second one is ((e1)+(e3)),
the third one is (((e1)+(e3))+(e3)) etc., the sign + denoting
the vector (memberwise) addition in vector case. The operand
e3 may be omitted in e1[e2#e3] expression. In that case the
vector consists of e2 identical members each of which is equal
to e1. If the length of the vector e3 is less than the length
of the vector e1 then missing right members are considered
zeroes.
Example. The constant expression 1[5#2] denotes the
vector {1,3,5,7,9}, the constant expression 1[5#1][2#1[5#]]
denotes the vector { {1,2,3,4,5}, {2,3,4,5,6} }, and the
constant expression 10[4#-2][3#{1,2}] denotes the vector {
{10,8,6,4}, {11,10,6,4}, {12,12,6,4} }.
If scalar operands of the { } operator are of different
arithmetic types the usual arithmetic conversions are
performed. If they are of pointer type all of them shall have
the same type.
9. Lvector.
In addition to notions of lvalue, blocked lvalue and
segment of array which are used for designation of objects and
are allowed as left operand of assignment operators the notion
of lvector is introduced.
Just as lvalue is an expression designating some object,
lvector is a vector expression designating the set of objects.
Example. If the objects x, y and pa are described as
int x, y, *pa[10];
then the vector expressions {y, x, *pa[3]} and *pa[] are
lvectors but the vector expression {x+y, y, x} is not.
Blocked lvalues of type array and segments of array can
be considered as special cases of lvectors which designate the
sets of objects allocated in regular way.
Lvector is modifiable if every member of the set of
objects designated by it is modifiable.
10. Vector conversions.
A conversion of the vector of one type to the vector of
another type is composition of two conversions, the first of
which converts the vector length the types of members
remaining unchanged and the second one converts the type of
vector members the vector length being unchanged.
We define a vector length as the number of its members.
For example, the vectors {1,2} and {{3,4,5},{6,7,8}} both have
the length 2.
There is not defined conversion of vector to non-vector
(in particular, to scalar).
A conversion of the non-vector (in particular, of the
scalar) to the vector of specified length and type of the
members is composition of two conversions, the first of which
converts the non-vector to the vector of specified length all
members of which have the same type and value as the initial
non-vector has and the second one converts the type of vector
members to specified type.
Shortening of a vector is performed by dropping its
ending members. Conversion of a vector to the longer one is
performed by adding of members having indefinite values to its
tail.
11. Integer vectors packing.
The C[] language provides bit-fields packing facilities
for integer vectors. Namely, an array declarator is allowed in
position of optional <declarator> in the bit-field declarator
of <declarator>(opt):<constant_expression> kind. The bit-field
declarator A[N:L]:M (A is an identifier, N,L,M are constant
expressions) specifies N sequentially (with the step L)
allocated bit-fields of the same width M. The width of
bit-fields and the step size are measured in bits. The
assignment of an integer vector of length N to the member A of
structure causes the packing of this vector members in
corresponding bit-fields. Unpacking of integer vector packed
in bit-fields is performed during the access to corresponding
member of the structure by means of the . operator.
12. Scalar operators.
Three groups of scalar operators are added to the C[]
language.
The first group consists of two binary ?> and ?<
operators of maximum and minimum calculation together with
corresponding compound assignments.
The second group contains three unary bitwise operators,
namely: ? (counting unit bits), % (counting leading zeroes), a
(word reversing).
The third group contains floating operators alowing to
specify the way of rounding in runtime, namely, the modifiers
of floating cast operators are introduced: > (rounding up), <
(rounding down),! (rounding to the nearest) and = (chopping).
Syntactically, a modifier is added as suffix to a name of
floating type in corresponding cast operator.
Example. In the fragment
float x;
double y,z;
...........
x=(float!)(y+z);
the value of the sum (y+z) having type double is rounded to
nearest value of type float, and resulting value is assigned
to varyable x.
13. Vector operators.
The operand of unary *, +, -, ~, ?, %, a, ! operators and
scalar cast operators may have vector type. In this case the
result of such operator is vector members of which are the
results of application of corresponding operator to members of
operand.
If a vector type name is used in a cast operator the
vector length specified by this cast operator may be specified
by arbitrary (not only constant) integral expression.
One or both operands of binary *, /, %, ?<, ?>, +, -, <<,
>>, <, >, <=, >=, ==, !=, &, ^, |, &&, || operators may have
vector type.
If both operands are vectors of the same length then the
result is vector members of which are the results of
application of corresponding operator to members of operands.
If one of the operands is scalar it is converted to the
type of vector operand.
If vector operands of binary operator have different
length then behavior is undefined.
The first operand of conditional operator may have vector
type. In that case the second or the third operand but not
both of them may be omitted. If none of the operands is
omitted then unlike the C language all three operands are
evaluated.
If all three operands are vectors of the same length then
the result is acheived by memberwise application of the
operator. If vector operands of conditional operator have
different length then behavior is undefined.
If the second or the third operand is non-vector they are
converted to the type of the first (vector) operand.
If the first and the second (or the third) operands are
vectors, the third (the second) operand is omitted and the
members of the first operand have scalar type then the result
will be the vector of the same type as the second (the third)
operand; the i-th member of the result is equal to the k(i)-th
member of the second (the third) operand where k(i) is index
of the i-th non-zero (zero) member of the first operand. The
other members have indefinite values.
If the first and the second (or the third) operands are
vectors, the third (the second) operand is omitted and the
members of the first operand have vector type then the result
is acheived by memberwise application of the operator.
An assignment operator may have as its left operand a
modifiable blocked lvalue, a segment of modifiable array or
any other modifiable lvector . In that case its right operand
may have vector type. In any case the type of its right
operand converts to the type of the left operand's value.
Example. The execution of following fragment
int a[]={0,1,2,3,4};
int *pa[]={a+1,a+2,a+3,a+4,a};
*(pa[])=a[];
results the vector {1,2,3,4,0} as value of array a.
The unary linear [*], [/], [%], [?<], [?>], [+], [-],
[<<], [>>], [&], [^], [|] operators correspond to binary *, /,
%, ?<, ?>, +, -, <<, >>, &, ^, | operators. These operators
are applicable only to vector operands. Let v[0],
v[1],...,v[N] denote the members of vector operand v; then the
expression [op] v[] has the same semantics as the expression
of (...((v[0] op v[1]) op v[2]) op ... op v[N]) kind.
Example. The execution of following fragment
int a[]={0,1,2,3,4}, sum;
sum=[+]a[];
results the value of sum equal to value of expression
a[0]+a[1]+a[2]+a[3]+a[4] that is 10.
In the C[] language unlike the C language a formal
parameter of a function may have an array type. The
declaration of such formal parameter must specify all array
attributes. Corresponding argument is the expression of the
same vector type as defined by formal parameter. The function
value also may have vector type.
14. Arrays and structures of register storage-class.
An array whose members are of scalar type and the step is
equal to 1 may be declared with storage-class specifier
register. It causes corresponding array head allocation on
vector register if there is free one. If there is no free
vector register or if array of declared type should not be
allocated on register the storage-class specifier is ignored.
There are some constraints in usage of register variables
of array type. In particular, the address & operator and the
[:] operator should not be applied to it. The array
subscripting operator is not expressed by * and + address
operators for register array.
A structure with members of scalar types may be declared
with storage-class specifier register. It causes corresponding
structure allocation on available scalar registers which allow
collective exchange with main storage.
15. Cache storage-class.
In C[]-machine we define a cache storage as additional
address space which is independent from main storage. A
definition of an object with storage-class specifier cache
causes an appropriate amount of cache storage to be reserved.
Aggregate object shall be allocated in main or cache
storage in indivisible way, that is all its members belong to
one address space. Binary address operators are invalid if one
of their operands points to cache storage and the another
points to main storage.
Storage-class specifier cache may be used in pointer
declarators. The declarator
* <type_specifier_list> <opt> <identifier>,
where the type_specifier_list contains specifier cache defines
the identifier as a pointer allocated in cache-storage. The
following pair of declarations demonstrates the difference
between a pointer to cache-storage allocated in main-storage
and a pointer to main-storage allocated in cache-storage:
cache char *ptr_to_cache;
char *cache cache_ptr;
NULL pointer has no attrbute specifying address space.
16. Concurrent calculation facilities.
The C[]- machine includes several processors each
connected with its own cache-storage and shared main-storage.
Access to the region of shared storage by one processor does
not block the access to the same region of storage by other
processors (except the semaphore operator lock).
Initially program is executed by one of the processors
but during its execution other processors may be activated.
Therefore in common case program statements are concurrently
executed by several processors.
In any time each processor may be in one of following
three states: active, passive or waiting. Initially only one
processor is active.
In C[] language is introduced statement label +:. This
label relates to one statement.
When a program-fragment is being executed by one of the
processors of the C[]-machine (we denote it as A-processor)
and control achieves +:-label, one of passive processors
(B-processor) is activated by A-processor and B-processor
begins to execute the statement labelled by +:-label, while
the following statement is concurrently executed by A-
processor. When thread of control leaves +:-labelled
statement, B-processor is returned to passive state. Note that
cancelling of automatic objects of compound statement, created
by B-processor while entering this statement is done while
leaving it.
Synchronization of the access to shared data is achieved
by semaphores.
New integral type sem is introduced in the C[] language .
The set of values of this type is subset of the set of values
of int. The set of values of sem is divided on two
unintersecting subsets one of which corresponds to the state
semaphore is open, and another to state semaphore is locked. A
semaphore variable is implicitly initialized by value from the
set semaphore is open.
The C[] library contains three functions to deal with
semaphore objects:
- the function
void lock(sem* s);
checks to which set the value of object *s belongs. If it
belongs to set semaphore open, then the value of *s is turned
to a value from set semaphore locked, after that function
normally terminates. If the initial value of variable *s
belongs to set semaphore locked, then function turns the
processor in waiting state until the value of *s be changed in
such way that it belongs to set semaphore open, after that the
processor is activated and continues its work from the
execution of the function lock. This function is not shared
that is no processor can access to object *s while processor
which executes it is active;
- the function
int unlock(sem* s);
assignes to variable *s value from set semaphore open and
returns 0 if initial value of *s belonged to set semaphore
open, and 1 otherwise; is function unlock shared or not,
depends on realization;
- the function
int test(sem* s);
returns 0, if *s belongs to set semaphore open, and 1, if *s
belongs to set semaphore locked; is function test shared or
not, depends on realization.
References
1. First working draft on programming language C (draft
proposed by ANSI). Ottawa, ISO/TC97/SC22/WG14, 1986.
Robert L. Knighten | knightenassd.intel.com
Intel Supercomputer Systems Division |
15201 N.W. Greenbrier Pkwy. | (503) 629-4315
Beaverton, Oregon 97006 | (503) 629-9147 [FAX]
More information about the Numeric-interest
mailing list