A Survey Of CUG C Compilers

Victor Volkman


Victor R. Volkman received a BS in computer science from Michigan Technological University in 1986. Mr. Volkman is a frequent contributor to The C Users Journal and the C Gazette. He is currently employed as Software Engineer at Cimage Corporation of Ann Arbor, MI. He can be reached at the HAL 9000 BBS, (313) 663-4173, 1200/2400/9600 baud.

Compiler construction is alternately the most rewarding and most frustrating area of software development. The C Users' Group offers public domain C compilers with source code for both those who study and those who use compilers. These packages have been independently developed by programmers who were often the first to implement the C language on their target machines. Some of these compilers share the ability to compile their own source to build new versions of themselves. All of them share their authors' vision of taking the C language to new frontiers.

A Small History Of The Small C Compiler

Since Ron Cain's introduction of the Small C compiler into the public domain nearly a decade ago, its implementations have spread like wildfire to nearly every popular microprocesor. The C User's Group is fortunate to be able to offer public domain compilers which have been ported to the Z-80, 8080, 6800, 6809, 8086, and 68000 (see Figure 1) .

Ron Cain's Small C Compiler v1.0, which debuted in the May 1980 issue of Dr. Dobb's Journal, was originally a very small subset of the C language. Small C has been a self-compiler since its first implementation. This means that performance improvements in code generation and parsing can be immediately incorporated back into the compiler itself. Small C is a one-pass compiler which generates assembly language from a C input file. The subset of data types which the original Small C recognized consisted only of characters, integers, and one-dimensional arrays of either type. Additionally, the only control statements were while and if. Small C was also restricted to bitwise logical (&, | ) operators since boolean (&&, | | ) operators were not supported.

In 1982, James E. Hendrix assumed trusteeship of Small C. Hendrix published numerous upgrades through Dr. Dobb's Journal culminating in the release of Small C v2.1 for CP/M in 1984. New features added along the way include code optimization, data initializing, conditional compiling, extern storage, for, while, switch/case, and goto statements, and a plethora of operators. To complete the system, James E. Hendrix and Ernest Payne developed a CP/M compatible version of the UNIX C standard I/O library. The internal design of Small C v2.1 was the subject of Hendrix's The Small C Handbook.

The first published 8086 PC-DOS implementation of Small C v2.1 appeared in 1985. Along the way, code optimization techniques were refined even more. The present incarnation from Hendrix, Small C v2.2, is available for 8086 PC-DOS only. Small C v2.2 was released simultaneously with Hendrix's definitive reference work A Small C Compiler: Language, Usage, Theory, and Design in 1988.

CUG C Compilers Based On Small C

Many of the C compilers available from CUG are based on some derivative of the Cain or Hendrix implementation of Small C. The exceptions to this rule are the 68000 C Compiler (disk #204) which has no lineage with Small C and the DECUS C Preprocessor (disk #243) which is not a full compiler. Some of the CUG C compilers based on Cain's Small C v1.1, include many of the enhancements published in Dr. Dobb's Journal over the years. This puts them approximately at the level of Hendrix Small C v2.0 discussed earlier. These enhanced Small C compilers are available as disk CUG104 Z-80/8080 (CP/M 80), CUG163 8086 (PC-DOS), and CUG221 6809 (FLEX OS).

An attribute which most of the CUG C compilers share is a noticeable lack of external documentation. All disks have less than a dozen pages of documentation with the exception of Small C w/Floats (CUG156) which includes 30 pages. Fortunately, their common heritage means their implementations remain similar to the well-documented Cain and Hendrix designs. Specifically, the Doctor Dobb's Journal issues from 1980 to 1982 (see bibliography) are the best source for Small C versions before 2.0. Alternately, Hendrix's Small C Handbook (now out of print) details these early versions. You might need to check your local university library for these publications. Unfortunately, Hendrix latest book A Small C Compiler will be less relevant to older versions due to recent internal code redesigns.

The CUG C compilers based on Small C, regardless of version, also share certain limitations of language features. In particular, struct, union, long, float, and double data types are not supported. The exception to this rule is of course Small C w/Floats (CUG156) which includes a 48-bit non-standard float. Additionally, arrays are limited to one-dimension and pointer arrays are specifically prohibited. These compilers also assume that ints and pointers are equivalent. This means the size of code and data pointers must also be the same. Small C-based compilers do not allow nested include files nor parameterized macro substitutions (as used in stdio.h). Also, the full set of C operators is often not present.

In general, the run-time libraries contain a good assortment of standard I/O, string, and keyboard-polling functions. Higher-level functions such as sprintf() are not always present. The libraries have very primitive linear memory allocation with alloc() and free(). Blocks of allocated memory must be freed in reverse order of allocation.

The overall ratings were based on my perception of the documentation, completeness, and usability of the implementation.

CUG104: Small C For Z-80/8080 (CP/M 80)

This implementation of Small C for the Z-80/8080 was done by Mike Bernson of Ann Arbor, MI. This Small C is not self-compiling and requires a special assembler and linker which are included only in CP/M 80 executable form. The compiler was developed with BDS C v1.41.

Mike Bernson has made several improvements to RC Small C v1.1 including most of the features of JH Small C v2.1 except goto/label and the ternary operator. The Standard C I/O library is included in both assembly language and object code format. Only three pages of documentation are provided, consisting of two pages of grammar and a one page listing of file contents.

CUG132: Small C For 6809 (Radio Shack Color Computer w/OS9)

Small C for the 6809 (Color Computer) was implemented by A.J. Griggs. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements among other things. This Small C is not self-compiling and requires BDS C v1.41 or later to compile. This package requires a 6809 assembler and linker which are not included. Small C for 6809 is designed as a cross-compiler which produces 6809 code while running under a 8080/Z-80 environment. After compilation, you would use the supplied serial-port driver to download the object code in Motorola S HEX format to the target 6809 machine.

This C compiler cannot be self-compiled because it has hardware dependencies on the byte order of 16-bit words. Specifically, the 6809 has the low and high bytes stored in the reverse order of 8080 machines. The compiler assumes a certain order in some cases and thus cannot compile itself.

This disk includes a serial driver, graphics library, and sample graphics game. The graphics library supports real-time animation in the player-missle arcade style. Graphics objects are managed in a list which stores their screen position and x/y velocity. During animation, the routines automatically flag collision of objects on the screen. The management of graphic objects is similar to the use of sprites on Commodore C64 and C128 machines.

Also on this diskette are a total of eight pages of documentation, six on the 6809 port and two on use of the graphics library.

CUG146: Small C For 6800 (FLEX OS)

This implementation of Small C for 6800 (FLEX OS) was completed by Serge Stepanoff of Livermore, CA. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements among other things. An additional restriction is that identifiers are limited to six significant characters. This Small C is not self-compiling and requires BDS C v1.41 or later to compile.

This package does not include a complete Standard C I/O library. A nonstandard printf() is used which requires that the number of arguments be passed as the last parameter.

Small C for 6800 (FLEX OS) does not compile to assembly or machine language, but rather to a pseudo-code. A small pseudo-code interpreter, less than 2K, actually executes the user's pseudocode. To run this pseudo-code in a different environment requires only the rewrite of the interpreter and the runtime library for the target machine. However, the source code for the interpreter is not included on the distribution diskette.

The diskette contains 11 pages of documentation, the first five pages are devoted to how to use the compiler and the remainder to the run-time library.

CUG156: Small C w/Floats (CP/M)

Small C w/Floats (CP/M) was implemented by James R. Van Zandt of Nashua, NH. This package was originally available as disk #224 from the Sig/M-Amateur Computer Group of Iselin, New Jersey. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements. Additionally, the following operators are not supported: logical or ( | | ), logical and (&&), logical not (!), bitwise-not (~), and the assignment operators (+=, -=, et. al.).

This disk includes the executable compiler and is self-compiling. The compiler reads C source and produces Z-80 assembly language. The two major speed enhancements relative to Ron Cain's original compiler are a hash coded symbol table and 1K disk buffers. Additionally, the compiler will resolve symbols uniquely up to the first 16 characters. This disk also includes the ZMAC macro assembler and ZLINK linker in executable form only.

Small C w/Floats supports the following usage of floating point:

double d;   48 bit floating point
double *d;  pointer to double
double d();  function returning double
double d[5]; array of doubles
Storage classes, structures, multidimensional arrays, unions, and more complex types like double **d are not included.

The layout of doubles does not conform to IEEE standard. These routines will execute only on a Z-80. They use the alternate registers and some of the undocumented instructions of that processor.

Small C w/Floats includes a full complement of transcendental functions for type double (Listing 1) .

If the "profile and trace" (-P) option of the compiler is used, each call to err() results in a walkback trace of function calls. In addition, an execution profile is displayed on the console at program termination (call to exit()). The profile consists of a list of the functions and the number of times (up to 999999) each was called. This is sometimes useful for debugging (to spot functions that are never called), but is most valuable for program execution time optimization.

With 30 pages of documentation, Small C w/Floats is the best documented of any compiler available from CUG. The documentation covers compiler usage and internal, floating point routines, Standard C I/O library, ZMAC macro assembler, and the ZLINK linker.

CUG163: Small C For 8086 (PC-DOS)

This implementation of Small C for 8086 (PC-DOS) was completed by Daniel R. Hicks of Rochester, MN. Small C for 8086 (PC-DOS) is distributed on two diskettes, the first contains the run-time library source and the second contains the compiler source and executable. This package was originally available as disk #152 from the Personal Computer Club of Toronto, Canada.

This is a self-compiler, but does require your own assembler and linker. This port of Small C is based on JH Small C v2.0 so that it does support switch/case, for, goto/label statements. Hicks standard C I/O library provides very good compatibility with its UNIX counterpart.

Hicks implementation imposes the following additional restrictions: lower-case and upper-case symbols are synonymous, both local declarations within a block and goto statements may not be used simultaneously, and the sizeof() operator is not supported.

Parameters are pushed in order of occurrence: The first parameter in a list is the first one pushed and therefore the deepest one in the stack. This is opposite the order of many C compilers, and it prevents some C library functions (such as printf) from being able to determine the parameter count by examining just the first or second parameter. For this reason, the compiler, prior to a CALL, loads register DL with the parameter count, thus allowing functions such as printf to be implemented.

Included on the diskette are nine pages of detailed documentation on the capabilities and limitations of the compiler.

CUG170: Miscellany V (Caprock C, version N for IBM-PC)

Caprock Small C for 8086 (PC-DOS) was implemented by Caprock Systems, Inc. of Arlington, TX. This disk was originally available as disk #315 from PC Software Interest Group (PC-SIG) of Sunnyvale, CA. This compiler is supplied in source form only, an executable version is not included. Additionally, the standard C I/O library is missing from this distribution. This version is close to RC Small C v1.0 since it lacks switch/case, for, and goto/label statements.

When compiled under Microsoft C 5.1, this file produced four errors and 53 warnings. All of these problems were the result of the assumption that integers are interchangeable with pointers.

No documentation is included with this compiler.

True to its name, the Miscellany V disk offers over 20 files of C functions. Some of the other offerings on this disk include Life and Towers of Hanoi games, a binary to Intel HEX format converter, and several keyboard utilities.

CUG204: 68000 C Compiler (UNIX System V)

The 68000 C Compiler (PC-DOS) was completed by Matthew Brandt of Norcross, GA. This compiler is intended as an instructive tool for personal use. Any use for profit without the written consent of the author is prohibited. As stated earlier, this is the only C compiler offered by CUG which is not derived from RC or JH Small C. This is an optimizing C compiler which generates assembly language for the Motorola 68000 processor. This system also requires a 68000 assembler and linker which the user must supply. It has successfully compiled itself on UNIX System V running on a Motorola VME-10. Since this code was written for a machine with long integers it may exhibit some irregularity when dealing with long integers on the IBM-PC.

This compiler vies with Small C w/Floats (CUG #156) for the best implementation of C. Although the 68000 C Compiler does not support floats, it does have features not found in any other CUG C compiler: longs, structures, unions, complex types (e.g. char **argv), enumerated types, and functions which return pointers to structures.

The disk includes one page of documentation outlining the limitations of the compiler. Brandt offers the following warning: "The author makes no guarantees. This is not meant as a serious development tool although it could, with little work, be made into one." The preprocessor does not support parameterized macro substitutions, only #include and #define macros are supported. Brandt advises that function arguments declared as char may not work properly and should be changed to int. When the compiler encounters a syntax error, an error number is printed but no descriptive text is provided. Lastly, the size of functions is slightly limited due to the fact that the entire function is parsed before any code is generated.

The compiler can be compiled by Microsoft C v3.0 or higher. MSC will issue many warnings but they can be ignored. The file MAKE.BAT may be used to rebuild the compiler.

CUG221: 6809 C Compiler (FLEX OS)

This implementation of Small C for 6809 (FLEX OS) was completed by Dieter H. Flunkert. The author has made several improvements to RC Small C v1.1 plus most of the features of JH Small C v2.1 except goto/label. Small C for 6809 (FLEX OS) has all other C control statements including switch/case, do/while, and for. Additionally, all C operators are supported including the elusive comma (,), ternary (?), and assignment operators (+=, -=, et. al). However, like most other Small C implementations, the data types for float, double, long, structures, and unions are not present.

An executable version of the compiler is not provided on the diskette. This system requires the TSC relocatable assembler, library generator and linking loader which the user must supply. The standard C I/O library is included in both C source and assembly language formats. The compiler has seven pages of documentation detailing the grammar and preprocessor commands.

When compiled under Microsoft C v5.1, it was revealed that many of the #include directives did not have quoted filenames (e.g. #include stdio.h). Once again, many warnings appeared from the use of integers as pointers. Proper compilation required adding #define VMS to every module.

CUG243: DECUS C Preprocessor (PC-DOS)

The DECUS C Preprocessor (CPP) was originally implemented by Martin Minnow. CPP was subsequently ported to PC-DOS by Ted Lemon and Jym Dryer. CPP reads a C source file, expands macros and include files, and writes an input file for the C compiler. If no file arguments are given, it reads from stdin and writes to stdout. If one filename is given, it will be the input file. If a second filename is given, it will be the output file. The full command line format is:

cpp [-options] [infile [outfile]]
The DECUS C Preprocessor has been updated to meet the specifications of the Draft ANSI C Standard. However, this C preprocessor is not designed to handle floating point expressions. An experimental floating point source file is provided for those who wish to experiment with it.

The following options are supported. Options may be given in either case.

-I directory
Add this directory to the list of directories searched for

#include "..."
and

#include ...
commands. Note that there is no space between the -I and the directory string.

More than one -I command is permitted. On non-UNIX systems -I directory is forced to upper case.

-D name=value
Define the name as if the programmer wrote #define<name><value> at the start of the first file. If is not given, a value of 1 will be used. On non-UNIX systems, all alphabetic text will be forced to upper case.

-U name
Undefine the name as if #undef name were given. On non-UNIX systems, name will be forced to upper case.

-X number
Enable debugging code. If no value is given, a value of 1 will be used. (For maintenance of CPP only.)

The preprocessor will look for an environment variable INCLUDE if include files cannot be found in the -I directories. Unfortunately, only a single search directory can be specified in the INCLUDE path (e.g. SET INCLUDE=\MSC\INCLUDE;\MY\SRC will fail).

CPP has been successfully built with Lattice C v2.00 and Microsoft C v3.00. The distribution disk contains four pages of documentation detailing how to prepare CPP under several different memory models.

Bibliography

Cain, Ron. "A Small C Compiler for the 8080s." Dr. Dobb's Journal, April-May 1980, pp. 5-19.

Cain, Ron. "A Runtime Library for the Small C Compiler." Dr. Dobb's Journal, September 1980, pp. 4-15.

Hendrix, J. E. "Small-C Expression Analyzer." Dr. Dobb's Journal, December 1981, pp. 40-43.

Hendrix, J. E. "Small-C Compiler, v.2." Dr. Dobb's Journal, December 1982, pp. 15-63. and January 1983, pp. 48-64.

Hendrix, J. E. and Payne, L. E. "A New Library for Small_C." Dr. Dobb's Journal, May 1984, pp. 50-81, and June 1984, pp. 56-69.

Hendrix, J. E. "Small-C Update." Dr. Dobb's Journal, August 1985, pp.84-91.

Hendrix, J. E. The Small C Handbook. Redwood City, CA: M&T Publishing Inc., 1984.

Hendrix, J. E. A Small-C Compiler: Language, Usage, Theory, and Design. Redwood City, CA: M&T Publishing Inc., 1988.

Volkman, Victor R. "Revised Handbook Details Small C Innards," The C Users Journal, February 1989, pp. 9-10.

Ward, Robert and Donna, Ed., The C Users' Group Library, McPherson, KS: R&D Publications, Inc., 1986.

Figure 1 Summary of CUG C Compilers

                       Target     Implementation
 CUG     Target      Operating    Based on Port   Date of Last  Overall
Disk #    CPU         System          From          Revision    Rating
------------------------------------------------------------------------

 104    Z-80/8080  CP/M 80 v2.2  RC Small C v1.1  06/28/1981    ***
 132    6809       0S-9          RC Small C v1.1  10/18/1983    **
 146    6800       FLEX v2.1     RC Small C v1.1  09/09/1982    **
 156    Z-80       CP/M          RC Small C v1.2  08/02/1984    ****
 163    8086       PC-DOS 1.1    JH Small C v2.0  01/14/1984    ***
 170    8086       PC-DOS 1.0    RC Small C v1.0  06/01/1982    *
 204    68000      Unix V        N/A              01/01/1986    ****
 221    6809       FLEX          RC Small C v1.0  11/15/1986    ***
 243    8086       PC-DOS 2.0    DECUS            12/01/1985    N/A

The overall ratings were based on my perception of the documentation,
completeness, and usability of the implementation.

Listing 1

atan(),     /* arc tangent */
sin(),      /* sine */
atan2(),    /* atan2(a,b) = arctan of a/b */
sinh(),     /* hyperbolic sine */
cos(),      /* cosine */
sqrt(),     /* square root */
cosh(),     /* hyperbolic cosine */
tan(),      /* tangent */
exp(),      /* exponential */
tanh();     /* hyperbolic tangent */
log(),      /* natural logarithm */
pow(),      /* pow(x,y) = x**y */
log10(),    /* log base 10 */

float(x); double x;    /* integer to floating point
                      conversion */
fmod(x,y); double x,y; /* mod(x,y) /
                      if 0 < y
                      then 0 <= mod(x,y) < y and
                      x = n*y + mod(x,y)
                      for some integer n */
fabs(x); double x;     /* absolute value */
floor(x); double x;    /* largest integer not greater
                      than */
ceil(x); double x;   /* smallest integer not less than */
rand();              /* random number in range 0...1 */