Big5
Big5 or Big-5 is a method of Codage of characters, mainly used with Taiwan and HongKong, making it possible to seize the traditional Chinese Caractères. Its equivalent for the simplified Chinese Caractères is the coding GB, used in Popular republic of China.
Name
The Chinese name Big5 五大碼 (Pinyin: wǔdàmǎ) means “Coding of the Five Large ones”. This name refers to the original goal to support the principal five at that time parcellings used with Taiwan, or to the five principal companies data-processing in Taiwan: 宏碁 (Acer), 神通 (MiTAC), 佳佳, 零壹 (Zero One) and 大眾 (FIC), which collaborated in the development of this code.
History
Method of encoding Big5 at summer defined by the Institute of the Industrial Information of Taiwan (財團法人資訊工業策進會) in 1984. According to certain sources, coding Big5 was popularized by its adoption in different commercial parcellings, and particularly, by the Chinese system AND which functioned under MS-DOS.
The government of the République of China declares it as its standard in the middle of the years 1980 whereas Big5 was already a standard in fact.
HongKong also adopted the coding of Big5 characters. However, the Cantonese, official language of this area, uses many antiquated Chinese characters which are not available in this character set. To regulate this problem, the government of HongKong created the extension “Government Chinese Character Set” in 1995 then “HongKong Supplementary Character Set” (HKSCS) in 1999. The extensions of HongKong are usually distributed in the form of patch.
Structure
Principles
The original table of characters Big5 is classified firstly by the frequency of use, then by the many features and finally by the radical Kangxi .
In this first character set, it missed there sinogrammes however frequently used. This is why, each supplier developed his own extension. Thus, the extension ETen became integral part of the standard current Big5 thanks to its popularity.
The structure of Big5 coding was not in conformity with the standard ISO 2022 but had certain similarities with coding Shift_JIS: it is about a Codage to double-bytes having the following structure:
- the first byte extends in the beach 0xa1-0xfe
- the second byte extends in the beaches 0x40-0x7e and 0xa1-0xfe
Certain alternatives of Big5, like HKSCS, use a wide beach for the first byte, corresponding to the values located between 0x80 and 0xA0 (as for the Shift_JIS ).
The value of each Big5 code is generally represented by a hexadecimal number with 4 digits , which describes the two bytes of the Big5 code in the same way as the representation big-endian (or large-boutiste ). For example, the Big5 code of the character " 五 " , whose bytes corresponding to him are 0xa4 0xad , A4AD is written.
In theory, Big5 coding appears to be exclusively with double-bytes. In practice, the Big5 codes are always employed with coding ASCII (or other character sets in 8 bits). You will be able to thus find a mixture of codes Big5 and ASCII in a text encodé in Big5 . The bytes belonging to the beach 0x00-0x7f , which are not found by in codings in double-bytes, will be regarded as being ASCII .
Some details
In original Big5, the table of code was separate in several parts:
In the majority of the extensions, the added characters were in the reserved beaches corresponding to the preceding zone : the additional punctuations were put in the reserved beach A3C0-A3FE , the additional characters in the beach C6A1-C8FE or F9D6-FEFE . Sometimes, because of the too significant number of added natures, certain groups could not comply with this rule like the Cyrillic letters and the kana which were found in the beach C6A1-C8FE .
See too
Zh-min-nan: Big-5
| Random links: | 172 (number) | Lamaids | Chasseneuil | Rue Teniers | Congonhas | Sedna_(mythologie) |