C language compiler

Phases of translation

The compiler processes a source file into an executable in eight conceptual steps, which are called phases of translation. While some of these phases may actually be folded together, the compiler behaves as if they occur separately, in sequence.

Trigraph sequences are replaced by their single-character equivalents. Trigraph sequences are explained in ``Preprocessing''.
Any source lines that end with a backslash and new-line are spliced together with the next line by deleting the backslash and new-line to form logical lines.
The source file is partitioned into preprocessing tokens and sequences of white-space characters. Each comment is, in effect, replaced by one space character. Preprocessing tokens are explained in ``Preprocessing''.
Preprocessing directives are executed, and macros are expanded. Any files named in #include statements are processed from phase 1 through phase 4, recursively.
Escape sequences in character constants and string literals are converted to their character equivalents.
Adjacent character string literals, and wide character string literals, are concatenated.
Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated.
All external object and function references are resolved. Libraries are linked to satisfy external references not defined in the current translation unit. All translator output is collected into a program image which contains information needed for execution.

Output from certain phases may be saved and examined by specifying option flags on the cc(CP) command line.

The preprocessing token sequence resulting from Phase 4 can be saved by using the following options:

-P: leaves preprocessed output in a file with a .i extension.
-E: sends preprocessed output to the standard output.

Output from Phase 7 can be saved in a file with a .o extension by using the -c option to cc. The output of Phase 8 is the compilation system's final output (a.out).