|
|
The basic difficulty in an Asian environment is the huge number of ideograms needed for I/O. To work within the constraints of usual computer architectures, these ideograms are encoded as sequences of bytes. The associated operating systems, application programs, and terminals understand these byte sequences as individual ideograms. Moreover, all of these encodings allow intermixing of regular single-byte characters with the ideogram byte sequences. Just how difficult it is to recognize distinct ideograms depends on the encoding scheme used.
The term ``multibyte character'' is defined by ANSI C to denote a byte sequence that encodes an ideogram, no matter what encoding scheme is employed. All multibyte characters are members of the so-called ``extended character set.'' (A regular single-byte character is just a special case of a multibyte character.) Essentially the only requirement placed on the encoding is that no multibyte character can use a null character as part of its encoding.
ANSI C specifies that program
comments,
string literals,
character constants,
and header names are all sequences of multibyte characters.