ISO 8859-1

The standard ISO 8859-1 , whose complete name is ISO/CEI 8859-1 , and who is often called Latin-1 , form the first part of the international standard ISO/CEI 8859, which is a standard of the International organization of standardization for the Codage of the characters in Informatique. It defines what it calls the Alphabet Latin number 1 , which consists of 191 characters of the Latin alphabet, each one of them being coded by a Octet (either 8 bits). ISO 8859-1 takes again the coding of the printable characters of US-ASCII.

In the Western countries, this standard is used by many Operating systems, of which Unix, Windows or AmigaOS. It gave place to some extensions and adaptations, of which Windows-1252 and ISO 8859-15. The distinction between ASCII, ISO 8859-1, ISO 8859-15, Windows-1252 and MacRoman is a source of confusion among the developers of computer programs.

Principle

At first approximation, ISO 8859-1 extends US-ASCII by adding the accentuated characters.

The ISO 8859-1 recovers the characters used by the following Langue S European: Albanian, German, English, Basque, Catalan, Danish, Scottish Gaelic, Spanish, Féringien, Finnish, French ( except characters œ, Œ and Ÿ which were not included because the employer of the one of the writers of the standard, a large manufacturer of printers, had not included these characters in his printers ), Icelandic (except characters „and “), Irish Gaelic, Italian, Dutch, Norwegian, Portuguese, Romanche and Swedish. The Afrikaans and the Swahili are also covered. This standard is thus used in Western Europe, in America, Australia and in most of the Africa.

The 191 characters of ISO 8859-1 are graphic, and thus compatible with the majority of the navigators Web. They are represented in the form of glyphes (eye) in the following table. The titles of the lines and the columns indicate the hexadecimal codes correspondent to each character, for example, the hexadecimal code of “L” are 4C, that is to say 01001100 in binary or 76 in Décimal.

In this table, 20 into hexadecimal (32 into decimal) is the code corresponding to the character “spaces” (often represented by SP, of space in English), and A0 (160) is the code of the nonbreaking space (often represented by NBSP, of No-station-wagon space in English). AD (173) is a small indent, which does not appear at all in certain navigators (often represented by SHY, of software hyphen in English). Codes 00 (0) with 1F (31) and 7F (127) with 9F (159) are not assigned in ISO 8859-1.

ISO 8859-1 compared to ISO-8859-1

IANA (Internet Assigned Numbers Authority) validated, for a use on Internet, ISO-8859-1 (notice the additional indent), a superset of the ISO/CEI 8859-1.

This chart of characters, or character set , or page of codes , compensate the ISO/CEI 8859-1 by allotting control characters to values 00 for 1F, 7F, and 80 with 9F. One obtains this manner 256 characters recovering all values 8 possible bits.

The IANA alias authorizes the use of following for the ISO-8859-1 (to be used independently of any capitalization):

  • ISO_8859-1:1987
  • ISO_8859-1
  • ISO-8859-1
  • Iso-ir-100
  • csISOLatin1
  • latin1
  • l1
  • IBM819
  • CP819

The name Latin-1 is an abstract name not recognized by the organization ISO or IANA, but used by some Logiciel S.

The following table shows the ISO-8859-1, with in text underlined the abbreviations representing the control characters and spaces.

There exist other parts of the standard ISO/CEI 8859 to which a character set corresponds validated by the IANA, for example the ISO/CEI 8859-10 (Latin alphabet no 6) strongly resembles the character set ISO-8859-10.

Each standard ISO/CEI 8859 - X codifies the characters in the same way: it covers the beach ASCII (20-7th) more 96 additional characters in beach A0-FF, for a total of 191 characters. Units ISO-8859- X add all the control characters ISO 646 C0 in the beach of 00 to 1F, and the additional control characters in the beach 7F-9F, thus offering a whole of 256 characters. The ISO-8859-1 is only, among all these units, whose coding of the characters is equivalent to the first 256 codes of the Unicode.

ISO-8859-1 is the standard coding used by the system X Window on the majority of the machines UNIX.

Codings of related natures

MacRoman

The computers Apple oldest Macintosh use a coding named MacRoman, which differs from the ISO 8859-1 from the 32 first and the last 127 characters, but includes all the same all the characters present in the ISO 8859-1, except for the invisible indent. On the other hand, MacRoman includes many characters which are not in the ISO 8859-1. The glyphe Euro replaced the preceding generic monetary symbol.

Windows-1252

On Microsoft Windows, some glyphes was added between 0x80 and 0x9F. This extension (which replaces or cancels all the control characters G1 which are assigned with these positions in coding ISO-8859-1 and are reserved in standard ISO/CEI 8859-1) is known on Internet as a Windows-1252.

ISO 8859-15

The ISO 8859-15, in particular introduced to deal with the character € of the Euro, manages the also best French because the characters ¤, ¦, ¨, ´, ¸, ¼, ½ and ¾ were replaced by €, Š, š, Ž, ž, Œ, œ and Ÿ. In French, the ISO 8859-1 remains more used however definitely, with OE and oe instead of Œ and œ, whereas the Ÿ is used only by some proper names. The standard Unicode is generally used when it is necessary to exceed the limits of ISO 8859-1, in particular, mathematical, phonetic symbols and characters not-Latin.

Limitations

The principal limitation of this coding of characters on a byte is the need for using several tables to cover several alphabets. Certain documents are typified with the assistance for example meta-data, in this case a table of character and only one is associated with each document. For example for a document HTML the beacon indicates that the document uses the table of characters ISO-8859-1.

When that is not the case it is the table of character of the operating system which is used. If it does not correspond to that of the document posting is incoherent. The problem is then impossibility of mixing in the same document of the alphabets which are not consequently defined table. For example French and Hebrew.

The solution is to pass to Unicode.

See too

Internal bonds

  • ASCII
  • MacRoman
  • Windows-1252
  • ISO 8859-15
  • Blocks of characters Unicode containing characters coded in at least one of the 16 parts of ISO 8859:
    • for ISO 8859-1:
      • Latin basic
      • Supplement Latin-1
    • for the other parts:
      • Latin wide -
      • has Latin wide - B
      • Diacritic combining
      • Greek and copte
      • Cyrillic
      • Thai
      • monetary Symbols

External bonds

  • ISO/IEC 8859-1:1998 final draft off standard the (pdf)
  • Differences between ANSI, ISO-8859-1 and MacRoman Character Sets
  • The Letter Database
  • ASCII
  • - ISO 8859-1 Counts with HTML Entity Names

Random links:Price of the Quay of the Goldsmiths | Department store | Fantômas (film, 1964) | Peter Müller (politician) | Science fiction novel | Université_de_Cantorbéry