World Wide Web
The World Wide Web , literally the “fabric (of spider) world”, is commonly called the Web , sometimes the Toile or the WWW . It is a system public Hypertexte functioning on Internet and which makes it possible to consult, with a navigator, pages put on line in sites. The image of the fabric comes from the Hyperlien S which bind the Web pages between them.
The Web is only one of the applications of Internet. Other applications of Internet are the Email, the Instant messaging, Usenet, etc the Web was invented several years after Internet, but it is the Web which made the Média S general public attentive with Internet. Since, the Web is frequently confused with Internet; in particular, the word Toile is often used in the current language without it being clear if it indicates the Web or Internet.
Terminology
Terms indicating the World Wide Web
The World Wide Web is and was indicated by many names and synonymous abbreviations: WorldWideWeb , World Wide Web , World-wide Web , Web , WWW , W3 , world Cobweb , world Fabric , Fabric .The name of the original project was WorldWideWeb . The words were quickly separate in World Wide Web to improve legibility. The name World-Wide Web was also used by the inventors of the Web, but the name from now on recommended by the World Wide Web Consortium separates the three words without hyphen. Although “world” is written world-wide or worldwide in English, the orthography World Wide Web and the abbreviation Web are now well established.
By inventing the Web, Tim Berners-Lee had also thought of other names, like Information Mesh (grid of information), Mine off Information or The Information Undermines (the mine of information, whose initials would be Tim ).
The initials WWW were largely used to shorten World Wide Web before the Abréviation Web does not take the step. The hard pronunciation in French as in English of WWW undoubtedly precipitated its decline. The letters www remain however very much used in the addresses Web and some other uses formal or technical, although that does not answer any technical constraint. In second half of the Years 1990, whereas the networks were blocked by the growing popularity of the Web, a widespread joke claimed that WWW meant World Wide Wait , that is to say “makes an attempt world”. WWW is sometimes shortened in W3 , abbreviation which one finds in the initials W3C World Wide Web Consortium.
To write “the Web”, the use of the tiny (“the Web”) is increasingly current. The Québécois Office of the French language recommends the Majuscule, the French Official journal recommends “the world cobweb”. This article makes the distinction between “the Web” and “a Web ”, also the capital letter is always used to indicate the Web.
Terms attached to the Web
The terminology suitable for the Web contains several tens of terms. This chapter exposes those which are used in this article.The expression in line means “connected to a Réseau”, in fact the Data-processing network Internet. This expression is not specific to the Web, one finds it in connection with the telephone.
A host is a Ordinateur on line. Each host of Internet is identified by a Adresse IP to which zeros correspond, one or more host names . This terminology is not specific to the Web, but with Internet.
A resource of the Web is an entity Informatique (text, image, forum Usenet, box with the letters electronics, etc) accessible independently of other resources. A public access resource is freely accessible since Internet. A local resource is present on the computer used, in opposition to a distant resource (or in line), accessible through a network.
One can reach a distant resource only by respecting a Communication protocol . The functionalities of each protocol vary: reception, sending, even continuous exchange of information.
HTTP (for Hypertext Transfer Protocol ) is the communication protocol commonly used to transfer the resources from the Web. HTTPS is the protected alternative of this protocol.
A URL (for Uniform Resource Locator ) point on a resource. It is a Character string making it possible to indicate a communication protocol and a site for any resource of the Web.
A Hyperlien (or bond ) is an element in a resource associated with a URL. The hyperlinks of the Web are directed: they make it possible to go from a source to a destination. Only the resource with the source contains the data defining the hyperlink, the resource of destination does not carry any trace from there. There exist two types of hyperlink: those of the first type must be activated to reach the destination; those of the second cause an automatic access to the destination.
HTML (for Hypertext Markup Language ) is a data-processing Langage making it possible to describe the contents of a document (titles, paragraphs, provision of the images, etc) and to include hyperlinks there. A document HTML is a document described with language HTML. Documents HTML are the most consulted resources Web.
In means of communication Customer-server , a waiter is a host on whom a Logiciel waiter functions to which can connect software customers functioning on hosts customers .
A Web server is a host on whom a Serveur HTTP functions (or Web server ). A Web server point of disjunction the resources which it serves.
A Navigateur Web is a software Client HTTP designed to reach the resources of the Web. Its basic function is to allow the consultation of documents HTML available on waiters HTTP. The support of other types of resource and other communication protocols depends on the navigator considered.
A Web page (or page) is a document intended to be consulted with a Web navigator. A Web page always consists of a central resource (generally a document HTML) and possible automatically reached bound resources (typically of the images).
A editor HTML (or Web editor ) is a software conceived to facilitate the writing of documents HTML and Web pages in general.
A Web site (or site ) is a whole of Web pages and possible other resources, dependant in a coherent structure, published by an owner (a company, an administration, an association, a private individual, etc) and lodged on one or more Web servers.
Visiter a Web site means “to consult its pages”. The term visit comes owing to the fact that several pages of a site are generally consulted, like one visits the parts of a building. The visit is carried out by a user (or visitor or Internaute ). The measurement of audience is obtained by copying the code in Javascript of a bond towards the site of a specialized person receiving benefits following the technique of the remote marker.
A Adresse Web is a URL of Web page, generally written in a simplified form limited to a host name. An address of Web site is in fact the address of a page of the site designed to accommodate the visitors.
A Hébergeur Web is a Entreprise of services Informatique S lodging (putting on line) on its Web servers the resources constituting the Web sites of its customers.
A Agence Web is a Entreprise of services Informatique S realizing of the Web sites for its customers.
The expression surfer on the Web means “to consult the Web”. She was invented to stress the fact that to consult the Web consists in following many hyperlinks of page on page. She is mainly used by the Médias; she does not belong to the technical vocabulary.
A Annuaire Web is an indexing Web site of the Web sites.
A Portail Web is a Web site trying to gather the broadest pallet of information and possible services in a Web site. Certain gates are sets of themes.
A Service Web is a customer-server technology based on the protocols of the Web.
Structure
Mathematical model
The World Wide Web, as a whole of resources Hypertext S, is modélisable in graph directed with the resources for tops and the Hyperlien S for arcs . Owing to the fact that the graph is directed, certain resources can constitute well (or cul-de-sac, less formally): there does not exist any way towards the remainder of the Web. Contrary, certain resources can constitute sources: there does not exist any way since the remainder of the Web.Technically, nothing distinguishes the World Wide Web from any other Web using same technologies; moreover innumerable private Webs exists. In practice, it is considered that a page of a popular Web site, like a Annuaire Web, belonged to the Web. The Web can then be defined as being the whole of the resources and the hyperlinks which one can recursively discover starting from this page, which excludes the sources and the private Webs .
Exploration of the Web
Recursive exploration of the Web starting from quite selected resources is the basic method of the robots of indexing of the search engines. In practice, several categories of resources discovered are often ignored:- resources without public access, in particular the personal, administrative or paying pages, protected by a password;
- resources belonging to distinct systems and often older than the Web (Email, Usenet, sites ftp), even simply resources not been useful by a Waiter HTTP or HTTPS;
- resources of Format of not supported data;
- resources listed in a File of exclusion of the robots;
- the resources towards which the Hyperlien S are created dynamically in answer to the interrogations of the visitors.
In 2004, the search engines index approximately 4 billion resources.
Major Web
See also: major Web
The “major Web” or “invisible Web” is the part of the Web which is not explored by the robots of indexing and thus untraceable with the search engines general practitioners. Studies indicate that the invisible part of the Web represents more than 99% of the Web. The major Web is in particular made resources in a Format of data incompatible with the search engines, of the resources contained in Web sites so large that the robots of indexing give up entirely indexing them and of the resources which do not have a known address. These last resources generally come from databases and are been useful in answer to the requests entered by the visitors.
Public waiters
Recursive exploration is not the only means used to index the Web and to measure its size. The other solution consists in measuring the computer infrastructure connected to Internet to lodge Web sites. Instead of following hyperlinks, this method consists to use the domain names recorded in the Domain Name System and to try to connect to all the Web servers potential. It is in particular the method used by the company Netcraft, which regularly publishes the results of its explorations, of which measurements of popularity of the waiters HTTP. This measurement relates more to the use of technologies of the Web than on the Web itself. It in particular makes it possible to find public sites which are not related to the World Wide Web.
Intranets and private Webs
A Web available on a Intranet is private. It either is completely separated from the Web, or a source of the Web. It is a source when the Intranet is connected to Internet and that a Hyperlien of the Web point on a resource of the Web. The bonds since the Web are on the other hand impossible because by definition an Intranet does not offer a public access.A source can also be on Internet. In this case, it constitutes a Web virtually deprived, because the public cannot discover it while following hyperlinks.
Filing
See also: Filing of the Web
The Web changes constantly: the resources do not cease being created, modified and removed. There exist some initiatives of file of the Web of which the goal is to make it possible to find what a site on a given date contained. The project Internet Files is one of them.
Types of resource
The various types of resource of the Web have rather distinct uses:
- resources constituting the Web pages: documents HTML, images JPEG or png or GIF, scripts Javascript, style sheets CS, sounds, Animation S;
- accessible resources since a Web page but consultable with a particular interface: audio stream, flow Video;
- resources conceived to be consulted separately: documents (pdf, PostScript, Word, etc), Textual file, Image S of any types, pieces of Music, video, file S to be safeguarded;
- resources belonging to systems quite distinct from the Web: forums Usenet, boxes with the letters electronics, files local.
Documents HTML
The document HTML is the main resource of a Web page, that which contains the Hyperlien S, which contains and structure the text, which binds and lays out the resources Multimédia S. a document HTML contains only text: the consulted text, the text in language HTML more possible other languages of script or style.
The presentation of documents HTML is the principal functionality of a Navigateur Web. HTML leaves to the navigator the care to exploit as well as possible the capacities of the Ordinateur to present the resources. Typically, the bill of character, the length of the lines of text, the colors, etc, must be adapted to the peripheral of exit (screen, Imprimante, etc).
Multi-media
The elements Multimédia S always come from resources independent of the document HTML. Documents HTML contain Hyperlien S pointing on the resources multimedia, which can thus be scattered on Internet. The elements dependant multimedias are automatically transferred to present a Web page.
Only the use of the images and small the Animation S is standardized. The support of the its, the Video, three-dimensional spaces or other elements multimedia still rests on not standardized technologies. Many a navigators Web proposes the possibility of grafting software (Plugin) to extend their functionalities, in particular the support of the types of nonstandard media.
Flows (audio, Video) require a Communication protocol with operation different of HTTP. It is one of the reasons for which this type of resource often requires a Plugin and is badly integrated into the Web pages.
Images
This chapter relates to the images integrated into the Web pages.
The use of the Format of data JPEG is indicated for the natural images, mainly the Photographie S.
The use of the format of data png is indicated for the synthetic images (logos, graphic elements). It is also indicated for the natural images, but only when quality takes precedence completely over the duration of the transfer.
The use of the format of data GIF is indicated for small the Animation S. For the synthetic images, the old popularity of GIF often makes it prefer with png. However, GIF suffers from some disadvantages, in particular the limitation of the number of Couleur S and a generally less degree of compression. Moreover a controversy surrounded the use of GIF of 1994 with 2004 because Unisys put forward a Brevet covering the method of compression.
The use of images of format of data XBM is obsolete.
Scripts
A Langage of script makes it possible to write the text of a program directly carried out by a Logiciel. Within the framework of the Web, a script is carried out by a Navigateur Web and programs actions answering the use that the made visitor of the Web page consulted. A script can be integrated into the document HTML or come from a dependant resource. The first language of script of the Web was JavaScript, developed by Netscape. Then Microsoft developed a concurrent alternative under the name of JScript. Finally, the standard ECMAScript was proposed for the syntax of the language, and the standards DOM for the interface with the documents.
Styles
The language CS was developed to manage in detail the presentation of the documents HTML. The text in language CS can be integrated into document HTML or come from bound resources, the style sheets. This separation allows a separate management of information (contained in documents HTML) and of its presentation (contained in style sheets). One also speaks about “separation of the bottom and the form”.
Others
The management of the other types of resource depends on the Logiciel S installed on the host customer and their adjustments.
When the software corresponding is available, the documents and images of any types are generally automatically presented, according to methods (fenestration, dialogs) depend on the Navigateur Web and software managing the type. When the type of the resource is not managed, it is generally possible to save it in a local file.
To manage the resources of systems different from the Web like the Email, the navigators call usually upon separate software. If no software manages a type of resource, a simple error message indicates it.
Design
Universality
The Web was conceived to be accessible with the most various data-processing equipment: Work station, Computer terminal in Mode text, Personal computer, PDA, etc This universality of access depends initially on the universality of the protocols Internet. In the second place, it depends on the flexibility of presentation of the Web pages, offered by HTML. Moreover, HTTP makes it possible to the navigators to negotiate the type of each resource. Lastly, CS makes it possible to propose various presentations, selected for their adequacy with the equipment used.The Accessibilité of the Web for the handicapped individuals is also the object of special attentions like the Web Accessibility Initiative .
Decentralization
Technologies of the Web do not impose an organization between the Web pages, nor a fortiori between the Web sites. Any page of the Web can contain a Hyperlien towards any other resource accessible from Internet. The establishment of a hyperlink absolutely does not require any action on the side of the pointed resource. There are no centralized register of hyperlinks, pages or sites. The only register used is that of DNS, it is a base of distributed data which indexes hosts and is useful to all the system baseds on Internet.This decentralized design was to support, and supported, a fast increase in the size of the Web. It also supported the rise of specialized sites in information on the other sites: the directories and the search engines. Without these sites, the search for information in the Web would be extremely hard. The opposite step, the Gate Web, tries to concentrate a maximum of information and services in only one site.
A weakness of decentralization is the lack of follow-up when a resource is moved or removed: the hyperlinks which pointed it find broken . And that is visible only by activating the hyperlink, the result more the current being the message of Erreur 404.
Technologies
Preexistent
The Web rests on technologies of Internet, in particular TCP/IP to ensure the transfer of the data, DNS to convert the host names into IP addresses and MIME to indicate the type of the data. The formats of Digital image GIF and JPEG were developed independently.
Specific
Three technologies had to be developed for the World Wide Web:-
URL to be able to identify any resource in a Hyperlien;
- the language HTML to write Web pages containing hyperlinks;
- the Communication protocol HTTP used between the navigator S and the Web servers, which makes it possible to indicate the type MIME transferred resources.
These first technologies were standardized like other technologies of Internet: by using the process of the Request For Comments. That gave the RFC 1738 for the URL, the RFC 1866 for HTML 2.0 and the RFC 1945 for HTTP/1.0.
The World Wide Web Consortium (W3C) was founded in 1994 to develop and promote the new standards of the Web. Its role is in particular to take care of the universality of new technologies. Technologies were also developed by private Entreprise S.
Current
The principal current standards are:- XML 1.0 developed to give to the languages beacons, of which HTML, a syntax simpler than SGML;
- HTML 4.01 based on SGML, and XHTML 1.0 based on XML;
- the RFC 2396 (Uniform Resource To identify S), which recovers URL;
- the RFC 2616 (HTTP /1.1);
- the style sheets cascades about it CS level 1 and level 2;
- models of document DOM level 1 and level 2;
- the Language of script Javascript to handle the documents;
- formats of Digital image png, JPEG and GIF.
History
Tim Berners-Lee works as data processing specialist with the European Organization for the nuclear research (CERN) when he proposes, in 1989, to create a system Hypertexte distributed on the Data-processing network so that the collaborators can share information within the CERN. This same year, the persons in charge of the network of the CERN decide to use the Communication protocol TCP/IP and the CERN opens its first external connection with Internet.The following year, the systems engineer Robert Cailliau joint with the hypertext draft with the CERN, immediately convinced of its interest, and is devoted vigorously to its promotion. Tim Berners-Lee and Robert Cailliau are recognized like the two people at the origin of the World Wide Web.
Until 1993, the Web is primarily developed under the impulse of Tim Berners-Lee and Robert Cailliau. The things change with the appearance of NCSA Mosaic, a Navigateur Web developed by Eric Bina and Marc Andreessen with the National Center for Supercomputing Applications (NCSA), in the Illinois. NCSA Mosaic provides the foundations of the Graphical interface modern navigators and causes an exponential increase in the popularity of the Web. The NCSA also produces NCSA httpd, a Serveur HTTP which will evolve/move in Apache HTTP Server, waiter HTTP more used since 1996.
In 1994, Netscape Communications Corporation is founded with a good part of the development team of NCSA Mosaic. Left fine 1994, Netscape Navigator supplants NCSA Mosaic in a few months.
In 1995, Microsoft tries to compete with Internet with The Microsoft Network (MSN) and fails. End 1995, after the exit of Windows 95 without the least navigator préinstallé Web, Microsoft launches with Internet Explorer the Guerre of the navigators against Netscape Navigator.
Chronology
The first years of this history are largely based on has Little History off the World Wide Web ( a little story of the World Wide Web ).
- Tim Berners-Lee, engaged with CERN with Geneva in 1984 to work on acquisition and the data processing, proposes to develop a system Hypertexte organized in Web , in order to improve information circulation interns: Information Management: In Proposal .
Random links: Philippe Duplessis-Mornay | Case (arms) | Gerald Regan | Infobulle | Referendum on the widening of the channel of Panamá