cc423 specifications

The source language for your compiler project will be a variant of C, but incorporating some aspects of Java; in particular, data structures will be more like those of Java than like those of standard C.

The language is defined below, by giving the differences between our language and standard C. Each section below refers to a section of Appendix A (pages 191--233) of Kernighan and Ritchie, The C Programming Language, second edition. If a section is not mentioned below, what that section says is the same in both languages.

Updated November 3: function declarations may only appear as external declarations.

Updated November 6: wording clarified in several places, regarding expressions which "must be boolean". Now the wording is of the form "must be boolean (or convertible to a boolean)".

 


A1 (introduction): not relevant.

A2 (lexical conventions): tokens are as defined in your scanner assignment.

Types

A4.1 (storage class): there is no explicit declaration of storage class; objects are static or automatic as they would be by default in standard C. There are no register objects and, since there is only one translation unit, there are no external objects.

A4.2 (basic types): there is no <limits.h>. The basic types are boolean, character, and (signed) integer. The boolean values are false and true. All of these are "arithmetic" and "integral" types as these terms are used in various other places in the Appendix. There are no floating points or enumerations, and no explicit void type.

A4.3 (derived types): the derived types are arrays and structures. I will also call these reference types, since objects of those types are dynamically allocated and accessed through implicit pointers. There are no explicit pointers, or such operations as pointer arithmetic. Strings are character arrays; the biggest thing special about strings is that there are string literals, whereas there are no constants of other array types.

A4.4 (type qualifiers): not relevant. There are no type qualifiers.

A5 (objects and lvalues): there are no expressions of the form *E.

A6.1 (integral promotion): booleans and characters may be promoted to integers. The promotion of false is 0 and the promotion of true is 1. The promotion of a character is its code in the machine's character set.

A6.2 (integral conversions): the only relevant conversions are the inverse of integral promotions. That is, a non-negative integer which is a character code can be converted to a character, and a value of 0 or 1 can be converted to a boolean. Note that other non-zero values do not convert to true.

A6.3 -- A6.8: not relevant. The types involved are not in our language.

Insert the following subsection:

 

A6.9 String conversions

Booleans, characters and integers may be converted to strings. The conversion of a boolean is "true" or "false". The conversion of a character is the one-character string containing that character. The conversion of an integer is a sequence of digits with no leading zeros, possibly preceded by a negative sign. In each case, the result is a reference to newly-allocated space containing the characters of the string.

From here on in the Appendix, ignore all mention of types that are not in our language.

 

Expressions

A7.1 (pointer generation): not relevant. Pointers cannot be used in the ways that this section implies.

A7.2 (primary expressions): similarly, most of the paragraph on string literals is not relevant. But there is one thing special about a primary expression which is a string literal: each evaulation of the expression produces a reference to a newly-allocated copy of the string.

A7.3 (postfix expressions): there is no -> operator.

A7.3.1 (array references): the first expression must be the name of an array, and the second must have an integral type. There is no equivalent expression involving pointer arithmetic.

The value of the second expression must be in the range from 0 to the length of the array minus one, inclusive. The result of subscripting past either end of the array is undefined. In particular, even if the end of a string happens to be marked by a null character in the implementation, that character cannot be accessed by subscripting.

A7.3.2 (function calls): the function designator can only be the name of a function declared in the current scope. There are no function pointers.

Passing a reference object is like passing a pointer to the object in standard C; the function may modify the contents of the object.

Ignore all mention of "old-style" functions: all functions are "new-style". However, there are no variable-length parameter lists (the ellipsis notation).

A7.3.3 (structure references): the dot, as a structure reference operator in our language, is like the -> operator in standard C. The left operand must be of a structure type, and the second operand names a field or "member" of that structure.

A7.4 (unary operators): there is no sizeof operator, and the only unary-operators are - and !. Subsections 2, 3, 4, 6 and 8 are not relevant.

A7.4.7 (logical negation operator): the operand of the ! operator must be a boolean (or convertible to a boolean), and the result is a boolean, not an integer 0 or 1.

A7.5 (casts): not relevant. There are no casts.

A7.7 (additive operators): there is no pointer arithmetic. But the + operator also denotes array concatenation (including string concatenation). The operands must be of the same array type; or in the case that one operand is a string, the other operand must be convertible to a string (see section A6.9). The result is a reference to newly-allocated space containing the elements of the first array followed by the elements of the second array.

A7.8 (shift operators): not relevant.

A7.9 (relational operators): the result of a relational comparison is a boolean, not 0 or 1. These comparisons are defined on the arithmetic types; there is no pointer comparison using these operators.

A7.10 (equality operators): the result of an equality comparison is a boolean, not 0 or 1. These comparisons are defined on the arithmetic types. Also, an object of a reference type can be compared to another object of the same type, or to null; this denotes comparison of the underlying pointers for equality.

A7.11 -- A7.13 (bitwise operators): not relevant.

A7.14 -- A7.15 (logical operators): the operands of && and || must be booleans (or convertible to booleans), and the result is a boolean.

A7.16 (conditional operator): There is no conditional operator. Replace conditional-expression with allocation-expression, defined by the following.

 

A7.16 Allocation operators

     allocation-expression:
         logical-OR-expression
         new identifier
         new type-specifier
[ expression ]
         copy logical-OR-expression

The second and third forms denote allocation of a structure or an array respectively; the identifier must be the name of a structure or array type respectively. The value of the expression is a reference (i.e., pointer) to the newly allocated object. In the second form, any initializers that fields of the structure may have are evaluated, and the initialization is done, at the time that the allocation is done. In the third form, the value of the expression between the brackets must be a nonnegative integer; that value is the number of elements in the allocated array.

The fourth form denotes a "deep copy" operation. The operand must be of a reference type. Space is allocated for a copy of the object pointed to by the reference, and the members of that object are copied into the newly allocated object. Any member which is itself of a reference type is deep-copied in this same way, recursively. (The effect of doing all this on an object containing cycles, which are possible to create with reference types, is undefined.) The value of the expression is a reference (i.e., pointer) to the newly allocated object.

Note: there is no explicit deallocation in this language; it was designed with the intent that storage would be garbage-collected. However, you are not required to implement garbage collection for the compiler assignment.

 

A7.17 (assignment expressions): the only assignment operator is =. One of the following must be true:

 

 

 

Assignment of an object of a reference type means copying the underlying pointer.

A7.18 (comma operator): not relevant.

A7.19 (constant expressions): not relevant. No expression is required to be constant.

 

Declarations

The declaration syntax is significantly rearranged, and much simpler than in standard C.

A8 (declarations): replace the syntax with:

     declaration:
         type-specifier init-declarator-list ;
         struct-declaration

     init-declarator-list:
         init-declarator
         init-declarator , init-declarator-list

     init-declarator:
         identifier
         identifier = initializer

Function declarations are removed from here; they may only appear as external declarations (see sction A10, below). This is a change from the language definition initially posted.

The last sentence of the section (empty declarations) is not relevant.

A8.1 (storage class specifiers): not relevant.

A8.2 (type specifiers): replace the syntax with:

     type-specifier:
         boolean
         char
         int
         struct-name
         type-specifier
[ ]

The rest of the section is not relevant.

A struct-name is an identifier which has been declared as a structure type in a struct-declaration: see section A8.3.

The syntax "type-specifier [ ]" denotes an array type. The size of the array is determined when space for the array elements is allocated, and is not considered part of the type.

A8.3 (structure and union declarations): There are no union declarations. Replace the syntax with:

     struct-declaration:
         struct identifier
{ field-declaration-list } ;
         struct identifier ;

     field-declaration-list:
         field-declaration
         field-declaration-list field-declaration

     field-declaration:
         type-specifier init-declarator-list ;

Most of the discussion that follows is not relevant.

A struct-declaration declares the identifier to be a structure type, which is a reference type, roughly corresponding to a pointer-to-struct type in standard C. A field of a structure type may be an instance of that same type. The syntax "struct identifier ;" is a "forward" declaration of a structure type to be declared in full later in the program; this permits mutually recursive structure types.

Notice that, in our language, a field of a structure may have an initializer. The initialization is done when the structure is allocated. In particular, the evaluation of the initializer (an expression) is done at that time.

An approximate equivalent, in our language, of the example at the top of page 214 is:

 

    struct tnode {
       char[] tword = new char[20];
       int count;
       tnode left;
       tnode right;
    };

Then, after this declaration, the following are similar to the remaining lines in the example:

 

    tnode sp;

 

    sp.count

 

    sp.left

 

    sp.right.tword[0]

A8.4 (enumerations): not relevant.

A8.5 (declarators): Most of this syntax has been removed, and most of the rest has been moved to other sections. Replace this section with the following:

 

A8.5 Function Declarations

A function definition is a "forward" declaration of a function to be defined later in the program.

     function declaration:
         type-specifieropt identifier
( parameter-type-listopt ) ;

If the type-specifier is missing, the function returns no value, like a void function in standard C.

 

A8.6, A8.6.1: not relevant.

A8.6.2 (array declarators): first paragraph: the constant-expression is never present, but such a type is considered complete anyway. Second paragraph: An array may be constructed from an arithmetic type, from a structure, or from another array (to generate a multidimensional array). The rest of the subsection is not relevant.

A8.6.3 (function declarators): replace the syntax with:

     parameter-type-list:
         parameter-declaration
         parameter-type-list , parameter-declaration

     parameter-declaration:
         type-specifier identifier

All function declarations are "new-style", in the terminology of the Appendix. Notice that a function may not be passed to or returned by a function. The rest of this subsection is not relevant.

A8.7 (initialization): the syntax is simply

     initializer:
         assignment-expression

There is no initialization of aggregates using the brace syntax. But recall that a structure is initialized using whatever initializers its fields may have. Also, a character array can be initialized with a string, but this works differently than in standard C: recall that the evaluation of a string literal, in itself, produces a reference to a newly-allocated copy of the string.

There is no default initialization for either static or automatic variables.

A8.8, A8.9 (type names, typedef): not relevant.

A8.10 (type equivalence): in our language, the question of type equivalence arises only in the case of structure types. Equivalence is by name: two structure types are always different, even if they contain fields of the same types in the same order.

 

Statements

From the syntax, omit labeled-statement.

A9.1 (labeled statements): not relevant. Ignore all mention of labels after this.

A9.4 (selection statements): switch statements are omitted. In an if statement, the expression must be boolean (or convertible to a boolean).

A9.5 (iteration statements): the do ... while form is omitted. The expression of a while statement must be boolean (or convertible to a boolean). The second expression of a for statement must be boolean (or convertible to a boolean), and is not optional.

A9.6 (jump statements): the only kind of jump statement is the return statement.

 

External Declarations, Scope, Linkage, Preprocessing, Predefined Functions

A10 (external declarations): function declarations are also included here.

     external-declaration:
        function-declaration
        function-definition
        declaration

A10.1 (function definitions): the syntax is simply

     function-definition:
         type-specifieropt identifier
( parameter-type-listopt ) compound-statement

A function may return any type, including a reference type.

Most of the rest of this section is not relevant.

A10.2 (external declarations): not relevant. In particular, each object may be declared only once, with the exceptions previously given for "forward" declarations of structure types and functions.

A11 (scope and linkage): there is no separate compilation.

A11.1 (lexical scope): the only name spaces in our language are (1) the one containing all variable, structure type and function names; and (2) for each structure type, one for the names of all of its fields ("members").

A11.2 (linkage): not relevant.

A12 (preprocessing): there is no preprocessing. Replace with:

 

A12 Predefined functions

The following functions are predefined, as if they were defined in a scope that surrounds the scope of the program.

A12.1 Length of an array

 

int length(anyType[] a)

The argument a is of any array type. The value of the function is the number of elements allocated for a; it is undefined if a is null.

A12.2 Input

 

char[] get()

Reads a line from standard input and returns it as a string. A line is either a sequence of characters, possibly empty, terminated by a newline character (in this case the newline is consumed but not appended to the string returned), or a nonempty sequence of characters terminated by the end of the input stream. On end of file (i.e., on every call after the last line, as defined above, has been read), the value null is returned.

A12.3 Output

 

put(char[] s)

Writes string s to standard output. Does not append a newline character.