Encoding variations
The encoding schemes come in two variations:
-
Where each multibyte character is self-identifying,
therefore, any multibyte character can simply be inserted between any pair of
multibyte characters.
(The encoding used by the ANSI C compiler is one of these types;
each byte of a non-single-byte character has the high-order bit set.)
-
Where the presence of special shift bytes
changes the interpretation of subsequent bytes.
An example is the method used by most
fancy character terminals to get in and out of line drawing mode.
For programs written in multibyte characters
with a shift-state-dependent encoding,
ANSI C has the additional requirement that each
comment, string literal, character constant, and header name must
both begin and end in the unshifted state.
Next topic:
Wide characters
Previous topic:
``Asianization'' means multibyte characters
© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003