ANSI implementation-defined behavior


This section describes the implementation-defined characteristics of characters. It corresponds to section ``F.3.4 Characters'' in the ANSI document.

Source and execution character sets

The source character set is the set of legal characters that can appear in source files. The execution character set is the set of legal characters interpreted in the execution environment. For the compiler, the source and execution character sets are the same, the standard 8-bit character set.

Bits per character

The number of bits in a character in the execution character set is represented by the manifest constant CHAR_BIT, which is defined in limits.h. It is defined as 8 bits.

Mapping character sets

The source code character set maps directly to the execution character set. The source character set, a proper subset of the ASCII character set, commences with ASCII 32 and ends with ASCII 126. This includes all of the printable graphic characters of the ASCII character set.

Constants with unrepresented characters and escape sequences

Certain non-graphic characters may be represented in the source character set by escape sequences which begin with a backslash followed by a lowercase letter. These escape sequences map onto specific characters in the ASCII set as shown in the following table:

Nongraphic character escape sequences

Escape sequence ASCII value Character Name  
\' 39 single quote  
\" 34 double quote  
\? 63 question mark  
\\ 92 backslash  
\a 7 bell  
\b 8 backspace  
\f 12 formfeed  
\n 10 new-line  
\r 13 carriage return  
\t 9 horizontal tab  
\v 11 vertical tab  

For escape sequences other than those listed in Table A-1, the backslash is stripped and the characters in the sequence are treated like ordinary characters. All other characters map directly from the source character set to the external character set.

Constants with multiple or wide characters

When a character constant contains more than one character, the individual values of the characters are concatenated together to form an int. Since int is equal to 4 bytes and each byte can hold a character, a character constant may be up to four characters long. Excess characters are dropped from the right side. For example if:

   int i = 'abcd';
then the value of i in hex is 61626364, the individual values for ``a'', ``b'', ``c'', and ``d'' concatenated together. Again, if
   int ii = 'abcdef';
then the value of ii in hex is 61626364, the individual values for ``a'', ``b'', ``c'', and ``d'' concatenated together. Note that the excess characters on the right, ``e'' and ``f'', are dropped.

Locale used for multi-byte conversion

The compiler uses the ``C'' locale to convert multibyte characters into corresponding wide characters.

Range of char values

A ``plain'' char has the same range of values as a signed char. The -J command line option changes the range of values to be the same as an unsigned char.

Next topic: Integers
Previous topic: Identifiers

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003