Unicode and HTML
The relation between Unicode and HTML tends to being a difficult subject for many professionals of data processing, authors of documents, and users of the Web. The exact and adapted representation text, in the Web pages, for different Language S and written forms is complicated by the details of Encodage of characters, of syntax of Langage of beaconing, Font face, and by the diversity of the levels of support offered by the navigators Web.
Characters of documents HTML
The Web pages are typically documents HTML or XHTML. The two types of documents consist, on a fundamental level, of character S, which are units of Graphème S and " grapheme-like" , independently of how they are stored in the systems of Computing memory and of networks data processing.
A document HTML is a sequence of Unicode characters.
That the document is in HTML or in XHTML, when it is stored on a Filing system or transmitted on a network, the characters of the document are digitized like sequence of Bit/Octet S according to a coding of special character. Coding can be one of the UCS Transformation Format, like UTF-8, which can directly code any Unicode character, or a coding of older nature like Windows-1252, which cannot.
The references of characters make it possible to be abstracted from the encoding of the document.
Determination of the encoding of characters
In order to interpret the HTML correctly, a navigator Web must suppose which Unicode character is represented by the encodée shape of a document HTML. With this intention, the navigator Web must know which encoding was used. When a document is transmitted via a message MIME or a layer of transport which uses contents of the type MIME such as an answer HTTP, the message can announce the encoding via a heading Content-Type, such asContent-Type: text/HTML; charset=ISO-8859-1. Other external means to determine the encoding are authorized, but seldom used. The encoding can also be declared inside even of the document, in the shape of an element META, like < meta http-equiv=" content-type" content=" text/HTML; charset=ISO-8859-1" > . In the absence of any declaration of encoding, the defect depends on the configuration of the localization of the navigator. On a system configured primarily for the Western-European languages, it will be generally ISO-8859-1 or one of its extensions like Windows-1252 or ISO-8859-15. For a navigator where the characters multi-bytes are the standard, a form of autodétection will be surely applied. A bad knowledge of coding used can lead to a degraded posting of the characters, in particular of the character euro.
The encodings 8 bits local are older than Unicode and of this fact more used in certain geographical areas. Because of these practices, in particular in the computer programming languages and operating systems, and desire to avoid annoying the users with the need for including/understanding the nuances of encoding, much of text editors used by authors HTML are unable or not-eager to offer the choice of the encoding, during the storage of a file on the disc, and often do not allow the seizure of characters apart from a beach of very limited value. Consequently, much of authors HTML problems of encoding are completely unconscious and can not have any idea of the encoding used in their documents. It is also commonly badly understood the declaration of encoding does not affect the effective encoding, considering which it is absolutely necessary only of one informative label which can not be exact.
Many documents HTML are presented with inappropriate declarations of encoding, or without any declaration. In order to determine the encoding in such cases, much of navigators allow the user to manually select an encoding in a list. They can also employ an algorithm of car-detection which works in.liaison.with the manual configuration. The manual overload can apply to all the documents, or only to those for which the encoding cannot be given by the declarations and/or the " patterns octets". The presence of a manual overload, and the fact that it is largely used and adopted by the users, hide the inaccuracy of the declarations of encoding on the Web; consequently of what, the problem is likely to persist. That was solved in a certain manner by XHTML, which, being XML, requires that the declaration of encoding is specified, and that no skirting is employed when it is unsuited.
Support of the navigators Web
Certain navigators Web such as Mozilla Firefox, Opera, and Safari, are able to post multilingual Web pages by intelligently choosing a police force to post each character of the page. They will post correctly any mixture of Unicode blocks, for little that the suitable police forces are present in the Operating system.Internet Explorer for Windows is able to post the complete set of characters unicode, but the characters which are not present in the first police force available will post only if they are present in the designated fallback make for the current international script (for example, only the police force Arial will be used for the Latin text, or Arial Unicode ms if it is installed beforehand; as for the other defined police forces, they are ignored). If the encoding of the character did not recognize, Internet Explorer will post rectangles instead of the not recognized characters. This is why the authors of Web page must define several police forces which would be likely to be present on the computer of the user, by manually declaring them like priority choice. Microsoft recommends the CSS or Style sheets cascades about it to declare the police forces desired. The characters in the table above haven' T been assigned specific font, yet most should render correctly yew appropriate font cuts been installed.
Former navigators, such as Netscape Navigator 4.77, cannot post that text supported by current make associated with the character encoding off the page, and may misinterpret numeric character refer aces being refer to code been worth within the current character encoding, rather than refer to Unicode codes points. When you are using such has browser, it is unlikely that your computer has all off those font, gold that the browser edge uses all available font one the same page. Ace has result, the browser will not display the text in the examples above correctly, though it may display has subset off them. Because they are encoded according to standard the, though, they will display correctly one any system that is compliant and does cuts the characters available. Further, those characters given names for uses in named entity refer are likely to Be more commonly available than others.
For the posting of the characters apart from the BASIC Multilingual Planes, like Gothic letter faihu in the table above, certain systems (as Windows 2000) require manual adjustments of their configuration. Font with larger unicode block coverage and vast character set are better than regular font.
Refer
| Random links: | Dave Grohl | Jean Bouise | Haptophyta | Horus Séhertaouy | Eurovision contest of the song 1965 | Le_profond_(John_Crowley) |