THE PLUS SYSTEMS PROGRAMMING LANGUAGE

Alan Ballard and Paul Whaley
Computing Centre
University of British Columbia
2075 Wesbrook Mall
Vancouver, Canada
V6T 1W5

CIPS 1984

Abstract

PLUS is a systems programming language designed originally for use with the IBM System 370 and compatible machines. It has been under development at UBC during the last three years. The implementation is now stable and is being used by an increasing group of systems programmers.

The language is based on Pascal. Its design and implementation were intended to provide the facilities needed for program development in a production environment.

This paper describes the goals of the language design and implementation and provides an overview of the language and compiler which resulted from them. It concludes with a description of the current status of the project.

Introduction

The University of British Columbia Computing Centre provides academic and some administrative computing services on an Amdahl 470 V/6 operating under the Michigan Terminal System. MTS is a general-purpose terminal-oriented operating system for IBM 370 compatible machines (2).

MTS was originally developed by the University of Michigan; it is now in use at several university computing centres which share the development and maintenance of the operating system and a very large collection of associated system software.

In 1975, essentially all of this software was written in Assembler. Distressingly, new projects being started at that time were still using Assembler also. Two or three higher-level systems programming languages were available under MTS. Occasional attempts had been made to use one or another for various projects. None, however, had gained any appreciable acceptance from the programming staff at the MTS installations. In each case, either the language did not satisfy the requirements of the programmers involved, or its implementation was considered inadequate (and usually both).

PLUS began as an unofficial "spare time" project in an attempt to develop a systems programming language and implementation that would satisfy the requirements as we perceived them. We believed that to be acceptable, a language would have to be designed and implemented with the facilities and quality expected in a production environment as an explicit goal. As the project progressed, other systems programmers at UBC became interested in the language, and the implementation eventually became an official project.

We initially considered implementing a new compiler for an existing language, rather than designing yet another language. However, there seemed to be little reason to do so. All of the candidates we were aware of seemed to have some inadequacies. None had any significant advantages in terms of availability of trained programmers, support on other machines, or extensive usage elsewhere. The language was not, however, intended to be particularly innovative, but rather to contain only features that we were confident we knew how to implement with a minimum of difficulty.

The existing languages which seemed to best meet the requirements were Pascal (4) and Sue (3). Sue was developed at the University of Toronto around 1971, and was based largely on Pascal. There was no usable Pascal compiler available in 1975. We did have a Sue compiler, however we did not consider the implementation to be adequate.

The language that we designed was based to a large extent on Sue and Pascal. PLUS is superficially quite different from either; however the underlying language semantics are really very similar. Undoubtedly, some parts of our language design reflect personal biases, and will not be unanimously viewed as improvements over Sue or Pascal. Some features of PLUS may be quite at variance with current trends in programming language research.

Language and Implementation Goals

The overriding considerations of the language and its compiler have been that they should provide facilities which contribute to increased programmer productivity in the development of correct, easily maintained, efficient programs. At the same time, the language must be reasonably easy to use by systems programmers in a primarily Assembler environment.

1. Control and Data Structures. The language provides only control structures which encourage reasonable program structure. The compiler provides an efficient procedure calling/entry sequence, in order to encourage modularity.

Data structures are a significant part of most systems programs. Pascal provides excellent facilities for describing data structures; the facilities provided by PLUS are very similar.

2. Readability and Understandabi1ity. It is particularly important for systems programs that another programmer should be able to pick up a listing and quickly acquire a general idea of how the program works. This has been an important goal of the language and has influenced its design in many ways. For example, the language is relatively English-like, using keywords in the syntax in preference to strange punctuation marks.

3. Compile-Time and Run-Time Checking. The language design is intended to allow many common errors to be detectable at compile time. The compiler performs a great deal of compile-time checking; it will also (optionally) generate extra code to check at run time for certain errors that cannot be detected at compile-time.

4. Efficient Code. A systems programming language must necessarily produce efficient code. One consequence of this is that we do not allow the language to include any constructs that are inherently inefficient.

5. Systems Programming Facilities. PLUS provides special facilities that enable the programmer to exercise precise control over the instructions generated and the allocation of storage and registers, when necessary for reasons of efficiency or to interface with hardware or external software.

6. Compiler Efficiency. We have not been attempting to produce an especially fast compiler; however it must be reasonable for use on very large programs. We consider costs per line of source of the same order as using Assembler to be quite reasonable. The PLUS compiler, is in fact cheaper per line than the available Assemblers.

Language Description

This section presents a brief overview of the more significant constructs of PLUS. For further details of the language, see (1).

Types

PLUS is a strongly-typed language. Each constant has an implicit type, each variable has an associated type specified in the variable's declaration, and each expression has a type derived from the operation and the types of its operands.

The compiler checks at compile-time that the operands of all operations, including assignment, are of compatible types. The types available in PLUS and the compatibility rules are very similar to those of Pascal.

Type specifications appear in variable and type declarations, and in the specification of more complex types.

A small set of predefined scalar types (integers, fixed and varying-length character strings, fixed-length bit strings) are provided, together with constructs that allow the programmer to define his own types. Programmer-defined types can include scalar types (defined as as arbitrary set of names), and subranges of other types. Thus, for example,

   (Reader, Printer, Disk, Tape)

defines a scalar type. The constants of this type are represented by the names Reader, Printer, etc. The compiler is free to choose the actual encoding to be used.

The type

   (1 to 100)

is a subrange of the integers. Variables of this type may be assigned integer values, provided they are in the specified range.

This extra range information is useful for run-time checking of program logic.

Array types are defined in terms of the range and type of subscript and the type of element. For example,

   array (10 to 20) of character(1)

specifies an array with subscripts 10,...,20. Each element of this type of array is a single character. Any type may be specified for the array elements; hence multiple-dimension arrays can be constructed as arrays of arrays.

Record types allow collections of logically related data items to be collected together in a structure, which can be processed as a unit. For example,

   record
      Info is character(1 to 20),
      Left_Link, Right_Link are pointer to Tree_Node
   end ;

defines a type of record containing three fields: a variable-length character string, and two pointers.

Pointer types may be defined in terms of the type of object that they point to. Thus, items of type

   pointer to character(1)

are the addresses of character(1) values. Pointers are only compatible with other pointers with the same object type.

Procedure types are defined by specifying the names and types of the parameters and result of the procedure. (The names are of significance only to the procedure definition. The types of the parameters are significant both to the definition of the procedure, and to the caller of a procedure.)

   procedure
   parameter Tree is pointer to Tree_Node, Depth is Integer
   end;

describes the type of procedures which take a pointer and an integer as parameters. In this example, no result is returned.

Note procedure types are no different from other types in the language. Thus it is possible to have procedure variables, arrays of procedures, etc.

Declarations

PLUS contains four principle declarative statements: the constant declaration, the type declaration, the variable declaration, and the procedure declaration.

Constant Declaration. This is used to give a symbolic name to a constant. Its use allows "magic numbers" to be defined centrally in one place, then referenced by name elsewhere.

The compiler evaluates expressions, all of whose operands are constant, at compile-time. Hence such expressions are equivalent to constants and may be used in any context requiring a constant. For example:

   constant Table_Size is 100;
   constant Max_Index is Table Size - 1;

Type Declaration. The type declaration associates a name with a type description.

This allows types appearing in more than one place to be defined once only, then referred to by name in other type descriptions, variable declarations, etc.

   type Tree_Node is 
      record
         Info is character(1 to 20),
         Left_Link, Right_Link are pointer to Tree_Node
      end;

This declaration associates the name Tree_Node with the given record definition. Note that the name of the type may be used within the type definition itself.

Variable Declaration. The variable declaration is used to define a variable of a given type. Variable declarations may be either local or global. Local variables are defined by variable declarations appearing within a procedure definition. They are allocated on a stack at procedure entry and disappear at procedure return. The names may be referenced only from within that procedure. Global variables are defined by declarations appearing within a global block. Global variables are allocated when program execution begins, and remain allocated throughout execution. They may be referenced from any procedure.

   variable Root is pointer to Tree_Node;

This example defines a variable that may contain pointers to items of type Tree_Node.

Procedure Declaration. The procedure declaration is used to declare a name to represent a procedure of a given type. The type specified must be a procedure type.

Procedure declarations must be present in the input to the compiler for every procedure that is is defined by the program or called from any other procedure in the program.

   procedure Preorder is 
      procedure
      parameter Tree is pointer to Tree_Node,
         Depth is Number
      end;

This example declares Preorder to be a procedure. It may be followed by the definition of the procedure, or by other procedures which include calls of Preorder.

Expressions

Expressions in PLUS are built up in the usual way, by combining various operands with appropriate operators and parentheses. The primitive operands out of which an expression is composed include constants, symbolic constants, variable names and procedure names. The repertoire of operations includes all the usual arithmetic and logical operators, subscripting array names, selecting fields of records, following pointers, and calling procedures with an appropriate list of parameters.

Assignment Statements

Simple assignment is denoted by ":=". The language allows a value to be assigned to a series of variables in one statement. It also allows specification of an operator in conjunction with assignment. For example,

   Depth +:= 1

is a shorthand for

   Depth := Depth + 1

Control Structures

IF statements. The IF-statement in PLUS is a bracketted construct, terminated by "end" (or "end if"). The "then part" and optional "else part" can each consist of an arbitrary sequence of statements.

   if Tree = Null
   then
      ...
      Preorder(Left_Link,Depth+1);
      Preorder(Right_Link, Depth+1);
   end if;

Select statements. The select statement is similar to what is called a case statement in Pascal. It allows a multiple-way branch according to the value of an expression given in the heading of the select statement.

In effect, it is a generalization of the IF-statement to types other than Boolean.

   select Kind_of_Device from
   case Reader:
      ...
   case Printer:
      ...
   else
      ...
   end select;

Depending on the value of the variable Kind_of_Device, one of the cases is executed. Following the statements given for that case, execution continues with the statement following the "end select".

Loops. PLUS contains a general looping construct called a cycle statement. The body of the cycle is executed repeatedly until terminated by execution of either a return statement (returning from the procedure containing the cycle) or an exit statement. After an exit statement, execution continues with the statement following the end of the cycle. An example is:

   cycle
      exit when Tree = Null;
      ...
      Tree : = Tree@.Left_Link
   end cycle;

Since looping with an increasing or decreasing index is a very common situation, PLUS contains a limited form of DO statement to provide for this. It is restricted to an increment or decrement of one only. An increasing loop is indicated by the keyword "to", while a decreasing loop is indicated by "downto". For example:

   do I := Low to High;
      ...
   end do;

Compilation Units

Input to the compiler consists of a sequence of statements, each of which may be a declaration, a global block or a procedure definition.

Declarations which are not contained in a procedure definition define global identifiers which may be referenced by all subsequent procedure definitions or statements.

Procedures. A procedure in PLUS consists of two parts, a procedure declaration which specifies the type of the procedure, and a procedure definition which contains the series of statements to be executed when the procedure is called. For example, following the declaration of Preorder given above, the definition might be given as:

   definition Preorder
      return when Tree = null;
      ...
      Preorder(Left_Link,Depth+1);
      Preorder(Right_Link,Depth+1)
   end Preorder;

Global Blocks. The global block provides a way of grouping a number of related declarations. It may contain any declaration statements. Variables declared within a global block are global variables. Storage for global variables is allocated when program execution begins. An example of a global block is:

   global Tree_Definitions
      type Tree_Node is ...;
         ...
      variable Tree_Root is pointer to Tree_Node;
   end Tree_Definitions;

Compiler Features

The implementation of PLUS includes many "extras" that we believe are important if the language is to be used extensively.

The compiler always produces a "paragraphed" source listing, indented to show control flow. it can optionally produce a paragraphed copy of the source, suitable for subsequent use as input to the compiler. We find paragraphed listings and source files much easier to read and edit. Various additional annotations which assist the programmer in understanding the program also appear in the listing.

The compiler performs extensive error checking, and attempts to produce explicit, informative error messages for all errors detected. It can optionally generate code to check at run-time for common errors (e.g., array subscripts out of bounds) that cannot be caught at compile time.

The implementation provides a powerful source library facility. This allows declarations for common procedures, types, etc., to be defined centrally in a library. Each program can then include from the libraries those definitions which it requires.

Various compiler options and compile-time facilities are available. They are integrated in with the source language by the use of compiler variables and compiler procedures. These are predefined identifiers which may be set or referenced within a program with immediate effect. For example, the title to appear in the source listing is set by assigning a value to the compiler variable %Title, as in:

   %Title := "This is a title";

Source is included from a library by invoking the compile-time procedure ^Include:

   %Include(Read, Write);

The code produced by the compiler is fully reentrant. Local variables are implemented using a stack. Global variables are implemented using a pseudo-register vector. The space for the stack and the pseudo-register vector are allocated when execution of the program begins.

Compiler Organization

The implementation of PLUS began as a one-pass compiler, following the organization of the Sue compiler. However, difficulties in implementing some constructs efficiently in one pass led to a split into several passes, applied on a procedure-at-a-time basis.

The first pass performs lexical, syntactic and semantic analysis of the procedure. It is driven by parser tables generated by an LALR(1) parser generator. The parser generator was originally written in Sue, but has now been rewritten in PLUS. This pass performs all declaration processing, storage allocation and type checking. It produces a table containing the attributes of all variables used by the procedure, and a tree representation of the code-generation semantic actions required by the procedure.

The next pass tours the tree and produces a stream of pseudo-code. This is very close to the actual machine code that will be produced, but has not yet bound any register usage or determined actual branch addresses. It assumes a slightly idealized machine instruction set.

A number of basic optimizations are performed by examining and modifying the stream pseudo-code.

Following this, register allocation is performed, and the pseudo-code is translated to 470 object code. A standard MTS/OS object module is then written.

MTS provides system services such as space allocation and I/O by means of system subroutines. The operating system dependencies of the PLUS compiler are thus isolated in calls to these subroutines. Only a small number of basic services are required. However, some aspects of the compiler structure reflect the fact that it was written for use under MTS. In particular it is assumed that a large (more than one megabyte) virtual memory is available. The compiler makes extensive use of virtual memory rather than using auxiliary files for maintaining the various program representations.

Work was recently begun on producing a cross-compiler generating PDP-11 object code. Essentially all of the first pass is the same in this version as in the 470 version. Since the stream pseudo-code is intended to be close to the actual machine code, this differs for the PDP-11 version; however, many of the mechanisms used to generate and process the code are largely unchanged.

Project Status

The implementation of PLUS was begun in November 1975, using the existing Sue compiler. The first version began generating code for simple programs about one year later. Around mid 1977, the implementation was sufficiently complete that we could begin production of a version written in PLUS. PLUS is sufficiently similar to Sue that most of the work of converting the Sue code to PLUS code was performed fairly easily with the help of a simple Snobol program.

Many parts of this preliminary version were written in a "quick and dirty" style, in order to allow us to bootstrap as early as possible. After converting to PLUS, we reviewed all the code, completely rewriting some parts, extending others, and adding new facilities.

We have now been using PLUS for more than a year in writing its own compiler. This experience has enabled us to refine some areas of the language design and to stabilize the implementation before many other users had access to it.

PLUS was released for use by other programming staff at UBC in late 1978 (although a number of systems programmers had been using a pre-release version). It was released to other MTS installations early in 1979. The PDP-11 version (which was started in November 1978) has begun testing, but has not yet been released.

Currently, PLUS is undergoing an evaluation period. Reaction has been quite favorable so far, and use is increasing, at least at UBC.

Most of the usage of PLUS so far has been for various utility programs (for example, a tape-drive exerciser, a source-update program, a unit-check reporting program). We anticipate that PLUS will eventually be used for development of new components of the operating system itself.

References

(1) Alan Ballard and Paul Whaley, "The PLUS Programming Language", Computing Centre, University of British Columbia, October 1976 (revised December 1978).

(2) D. W. Boettner and M. T. Alexander, "The Michigan Terminal System", Proceedings of the IEEE, vol. 63, no. 6, June 1975.

(3) B. L. Clark and J. J. Horning, "The System Language for Project Sue", SIGPLAN Notices, October 1971.

(4) K. Jensen and N. Wirth, Pascal User Manual and Report, Springer-Verlag, 1975.