もっと詳しく

Introduction

← Previous revision Revision as of 01:05, 25 November 2021
Line 46: Line 46:
A second requirement of ISO-2022 was that it should be compatible with 7-bit communication channels. So even though ISO-2022 is an 8-bit character set any 8-bit sequence can be reencoded to use only 7-bits without loss and normally only a small increase in size.
A second requirement of ISO-2022 was that it should be compatible with 7-bit communication channels. So even though ISO-2022 is an 8-bit character set any 8-bit sequence can be reencoded to use only 7-bits without loss and normally only a small increase in size.
To represent multiple character sets, the ISO/IEC 2022 character encodings include [[escape sequence]]s which indicate the character set for characters which follow. The escape sequences are registered with ISO and follow the patterns defined within the standard. These character encodings require data to be processed sequentially in a forward direction since the correct interpretation of the data depends on previously encountered escape sequences. Note, however, that other standards such as ISO-2022-JP may impose extra conditions such as the current character set is reset to US-ASCII before the end of a line.
To represent multiple character sets, the ISO/IEC 2022 character encodings include [[escape sequence]]s which indicate the character set for characters which follow. The escape sequences are registered with ISO and follow the patterns defined within the standard. These character encodings require data to be processed sequentially in a forward direction since the correct interpretation of the data depends on previously encountered escape sequences. Note, however, that other standards such as ISO-2022-JP may impose extra conditions, such as that the current character set is reset to US-ASCII before the end of a line.
To represent large character sets, ISO/IEC 2022 builds on [[ISO/IEC 646]]’s property that one seven bit character will normally define 94 graphic (printable) characters (in addition to space and 33 control characters). Using two bytes, it is thus possible to represent up to 8,836 (94×94) characters; and, using three bytes, up to 830,584 (94×94×94) characters. Though the standard defines it, no registered character set uses three bytes (although [[Extended Unix Code#EUC-TW|EUC-TW]]’s unregistered G2 does). For the two-byte character sets, the code point of each character is normally specified in so-called ”[[kuten]]” (Japanese: {{lang|ja|区点}}) form (sometimes called ”qūwèi” (Chinese: {{lang|zh-cn|区位}}), especially when dealing with [[GB2312]] and related standards), which specifies a zone ({{lang|ja|区}}, Japanese: ”ku”, Chinese: ”qū”), and the point (Japanese: {{lang|ja|点}} ”ten”) or position (Chinese: {{lang|zh-cn|位}} ”wèi”) of that character within the zone.
To represent large character sets, ISO/IEC 2022 builds on [[ISO/IEC 646]]’s property that one seven bit character will normally define 94 graphic (printable) characters (in addition to space and 33 control characters). Using two bytes, it is thus possible to represent up to 8,836 (94×94) characters; and, using three bytes, up to 830,584 (94×94×94) characters. Though the standard defines it, no registered character set uses three bytes (although [[Extended Unix Code#EUC-TW|EUC-TW]]’s unregistered G2 does). For the two-byte character sets, the code point of each character is normally specified in so-called ”[[kuten]]” (Japanese: {{lang|ja|区点}}) form (sometimes called ”qūwèi” (Chinese: {{lang|zh-cn|区位}}), especially when dealing with [[GB2312]] and related standards), which specifies a zone ({{lang|ja|区}}, Japanese: ”ku”, Chinese: ”qū”), and the point (Japanese: {{lang|ja|点}} ”ten”) or position (Chinese: {{lang|zh-cn|位}} ”wèi”) of that character within the zone.