This section describes the implementation-defined characteristics of characters. It corresponds to section ``F.3.4 Characters'' in the ANSI document.
The source character set is the set of legal characters that can appear in source files. The execution character set is the set of legal characters interpreted in the execution environment. For the compiler, the source and execution character sets are the same, the standard 8-bit character set.
The number of bits in a character in the execution character set is represented by the manifest constant CHAR_BIT, which is defined in limits.h. It is defined as 8 bits.
The source code character set maps directly to the execution character set. The source character set, a proper subset of the ASCII character set, commences with ASCII 32 and ends with ASCII 126. This includes all of the printable graphic characters of the ASCII character set.
Certain non-graphic characters may be represented in the source character set by escape sequences which begin with a backslash followed by a lowercase letter. These escape sequences map onto specific characters in the ASCII set as shown in the following table:
Nongraphic character escape sequences
|Escape sequence||ASCII value||Character Name|
For escape sequences other than those listed in Table A-1, the backslash is stripped and the characters in the sequence are treated like ordinary characters. All other characters map directly from the source character set to the external character set.
When a character constant contains more than one character, the individual values of the characters are concatenated together to form an int. Since int is equal to 4 bytes and each byte can hold a character, a character constant may be up to four characters long. Excess characters are dropped from the right side. For example if:
int i = 'abcd';then the value of i in hex is 61626364, the individual values for ``a'', ``b'', ``c'', and ``d'' concatenated together. Again, if
int ii = 'abcdef';then the value of ii in hex is 61626364, the individual values for ``a'', ``b'', ``c'', and ``d'' concatenated together. Note that the excess characters on the right, ``e'' and ``f'', are dropped.
The compiler uses the ``C'' locale to convert multibyte characters into corresponding wide characters.
A ``plain'' char has the same range of
values as a signed char.
The -J command line option changes
the range of values to be the same as an unsigned char.