Google (search engine)
The search engine Google , which gave the name to the company Google, is the Search engine on Internet more used in the world.
Principles and characteristics
The system of grading PageRank
The principle of operation of Google, which made its success, is founded on an invention of its creators, the PageRank: when a document is pointed by many Hyperlien S (Popularité of bonds), its PageRank increases. The more its PageRank is raised, the more it will be likely to be posted in the first results of a research. This system gives an indication of the popularity document among the other documents of the web.This principle immediately was a success, because it allowed results more relevant than the other search engines which were satisfied to enter the key words inserted in the pages of the sites. It also allowed what one calls the Bombardement Google.
Sobriety and valorization of the words
Moreover, this search engine is also appreciated for its speed of research and its sobriety: no the Flash, not by advertizing stringcourse flickering, etc Its interface inspired that of other engines, like Yahoo!.This sobriety, far from being anecdotic, is at least partly at the origin of the success of the site. At the time of its launching indeed, the fashion was with the search engines inserted on pages very charged in contents and publicity. These pages were often long to be posted and difficult to read. It uses all the same a system of AdWords (“Publicité of words”) to be remunerated. This system is founded on a value by word according to its request. The more the word will be required, the more it will be paid expensive by click.
Infrastructure
About 2002, Google affirmed to distribute the load on more than 10.000 PC functioning with a Noyau Linux modified. The figure of 1.000 simultaneous requests at a peak often was also evoked. The real figures seem 10 times higher. They are however secret, in particular not to make it possible to easily calculate the required investment to compete with Google.- Google and Akamai: Cult off Secrecy vs. Kingdom off Openness
- .
Google uses named robots Googlebot which visit with regular interval the whole of the Web sites having required to be referred in order to maintain up to date the database which provides the answers to the requests of the Net surfers.
Logos
Except the official logo, the site adopts particular logos for certain festivals and events: the Google Doodle S. Réalisés out by Refusals Hwang, an American designer of 23 years Korean origin, they appear regularly as soon as a local or international festival (New Year's Day, national festivals, etc) or an big event (Olympic Games, commemoration of a famous person, etc) allows it.All the logos of festivals and events of www.google.com put on line since 1999 are available here and, more specifically, those which appeared in France are available there.
Beta release
A Beta release is usually a mention meaning that program is in phase of completion. At Google it became a trademark affixed on the majority of the services and Logiciel S except for the search engine and of the advertizing services.The interest which lies in name “beta release” is that, from the point of view of the quality of the service, it does not engage with any obligation of result, since it is a phase of development. That can also mean that the Google services are in perpetual phase of improvement.
This characteristic specific to Google becomes a fashion which is translated at its competitors by a more open use of this mention.
Tree structure
Services
This search engine is available in 35 languages and proposes its interface in addition to 100 languages. Google is at the base an search engine of Web pages, it extended gradually to various types of documents (pdf, Microsoft Word, Flash,…), with the images. Like with the forums Usenet, by Google Groups since the repurchase of Already News. The Web2news gives access to the forums devoted to Google. It now has a section repertory which make it possible to find sites by category (repertory Dmoz classified by PageRank), and a gate of current events gathering the sites of the newspapers with great pulling and the greatest news agencies. The great popularity of Google and its very diversified development policy (advertizing bonds, purchases of databases and files of forums) ended up involving a certain number of fears as for the potential drift of this power: indeed, it is enough sometimes to “googler” the name of a person to obtain on her personal informations and thorough. Google proposes thus a growing number of functions additional, available either by the normal Google field, or in the form of Application Web.
The Councils of use of the Google search engine
Google proposes a simple form and a form of advanced research allowing to exclude from the words or to seek complete expressions (see here other advanced functions).
Terms to be sought:
The documentation of Google on its interpretation of the requests is enough Spartan. Evolution of operation observed watch that this is undoubtedly intentionally to keep a maximum freedom of change. What follows must be continuously validated and altered to follow the modifications.- H2O is sought like only one word and Google then does not find the documents with H2 O or H2O in their text. Those are found by requiring " H2 O". H-2-O (see further the role from the indent) finds H2O as well as H2 O and H2O . Unfortunately, the operator “indent” seeks only the two extreme combinations (all the stuck words or all the separate words: he does not find H2 O ).
- word : A word and its alternatives singular/plural - masculine/female - with/without accents. For example pommel horses finds pommel horse : this algorithm functions in French and English but not in Dutch (it does not know plurals in " in "). Attention: the alternative that you specify is favoured in the sorting of the documents presented.
- ~mot : A word and its synonyms. Function with an English dictionary even on research in French and Dutch! To test the request ~automobile - automobile to see the words found apart from the automobile strict term. ~arabic turns over Egypt , Lebanon , Arab and… Hindu ! The source of the synonyms is not known.
- "mot" : An exact word. Google does not take account of the stressing for research but supports the form specified at the time of the sorting of the documents presented.
- "word… mot" : a succession of specific words, an expression
- " word * mot": in a succession of words between quotation marks (and only there), a star can be put at the place of one or more complete words which one does not wish to specify. For example: " ministry for * and the commerce"
- site: www… : a field of origin. One can be more or less general and even indicate fields of first level. For example:
site: org GOLD site: com - title:" word… mot" : a succession of words specifically in the title of the document (marks out
< title>… < /title>and/or first beacon< h1>… < /h1>) - +mot : to seek this word even if it is a blank word in the language of the user (French +de for example) and to seek it by taking account of the accents (+dés for example). One “+” is assumed so only one word is sought: the only is sought as if +the had been typed. (This form thus has a significance very different from that of AltaVista where the " +" indicated obligatory words) At the time of the sorting of the documents, Google gives the preference to the typed form: the operator “+” thus does not have much any more of interest.
- word-word : to seek a term made up of several words, that he is written with indents, spaces or even without space of the whole: skyscraper finds scrapes sky , skyscraper and gratteciel . skyscraper does not mean the same thing at all as skyscraper (see the operator " - "). Caution: go-naked-foot finds goes naked foot and vanupied but not go nupied .
Logical operators (Boolean):
- space : the documents must contain what is on the right AND what is on the left. The sorting of Google supports the documents where the various specified words are close one to the other (see low).
- GOLD or | : the documents can contain what is on the right OR what is on the left. Caution: HOWEVER must be written in capital !
- space - (minus sign) : to exclude the documents containing the word which follows (EXCEPT)
- (…) : under-expression to be evaluated before making the neighbouring operations
Limits:
- the requests are limited to 32 words .
- Seuls the first 1000 relevant results for a request are accessible, and this same if the correspondences are more numerous. The results can even sometimes be less than 1000 because of the removal of the pages coming from the same site. According to Google, to obtain more than 1000 results would involve a heavy additional expenditure for a request finally rather rare.
Dates:
- At the time of a research by dates , the date is that of the indexing in the data bank (i.e the visit of the “spider” Google) and not that of the effective publication of the page (as provided by the waiter http:// )
- In the form of advanced research, you can make a research on the last 3,6 and 12 months.
- the operator daterange: date Julienne - date Julienne (or the form of the site of HotBot) makes it possible to specify another interval of dates. A Julienne date is the number of days passed since the beginning of our era: the http://www.numerical-recipes.com/julian.html site can help you to calculate it.
Sorting of the result:
The quality of Google comes from its capacity to in general show in first the most relevant pages considered to be and most relevant with a research in particular. Google sorts the documents found in function:- of measurements of quality of the site in general and also of each page (coherence of méta-information with the visible text of the page for example). These measurements or little are not documented.
- a measurement of the weight of each indexed page: It is about the algorithm PageRank of which here a quoted extract of Google:
- of the presence in the page of the words of research (possibly extended with their synonyms or their alternatives plural singular/)
- of the site of these words in the page (title, meta-data, text) or in the bonds towards this page: this last point causes sometimes ethical problems because a page is found indexed by the words that other people whom its authors use to designate it. (Test: " miserable failure " , the author of the page concerned did not seek this qualifier consciously! )
- Of each word, formulates which takes into account the number of occurrences of the word in the page balanced by the reverse of the relative frequency of this word in the part of the Web indexed by Google:
- tfi = frequency of the term I in the page
- dfi = many pages in the Web containing the term I
- D = many documents in the Web
- This formula was developed by Gerald Salton (1927-1995), Université Cornell, on the basis of Information theory of Claude Shannon.
- of the distance in the page enters the sought words: the closer they are one to the other, the more the page is considered to be relevant with respect to research carried out. See:
- of the country indicated by the URL of access at Google: google.be grants a clear preference to the Belgian sites, google.fr with the French sites, google.com with the American sites and google.co.uk with the English sites, etc It is really important to choose the “localization” of its research. The following page should more often be used as page of starting of a research:
- of the language of the user who is also that of the sought words: the only form making it possible to specify it is on. Only the other means of changing the language of the user is to modify “with the hand” the URL from Google (http://www.google.be/search?hl=fr&q=.. ) by changing the parameter &hl= xx ( xx being the code in two letters of the desired language).
Complementary functions
Google also proposes complementary functions:- With the one of the topicality: certain key words related to the topicality at the head return results 3 titles of articles of Google Actualités. A button makes it possible to seek in the one of the topicality.
- Conversion of currencies: e.g.: in the field seeks, to type: 3 euros in dollars , Google will post: 3 euros = X, xxxxx dollars American (rates provided by Citybank unguaranteed).
- Computer Google: in the field seeks, to type a mathematical formula
- Automatic translation
- Fichiers pdf
- Pages out of mask: allows to post the page stored in the base of Google, useful if the page does not exist any more
- similar Pages
- dependant Pages: in the field seeks to type link: site.com to post the external pages which poinent towards URL specified
- Opérateurs of targeting: allows to exclusively make its research on only one Adresse Web. Syntax: “site: your request ”.
- I have chance
- Définitions: allows to obtain a definition of words. This function is from now on available in English, French, Spanish, German, Chinese, Italian and Russian. Syntax: “define: word to define ”
- Google Movies: To type movies: title to post critics of the film which one typed the title (in English); On http://www.google.fr/movies, there are the choice between Web research and the research of films, as well as the schedules of the scéances of Cinéma S of certain cities of the United States.
Special characters
Google manages the accents written in the form of entities, but not the characters Unicode. Consequently, to seek “olefinic hydrocarbon” and “olefinic hydrocarbon” does not give the same result (because a word alone is sought by giving a preference to the form in which he was written) while to seek “encyclopedia” or “ENCYCLOPEDIA” nothing changes.If you type “receipt of soup to * and tomato”, Google will propose to you the basil or pumpkin in the place of star. One can widen his research with the synonyms of a word, by preceding it by the symbol “~”. “+” allows to force the word to be interpreted such as it is by Google (this is in particular useful for the French accents).
Diverted use of Google
The many functionalities of Google gave rise to various ludic uses by the Net surfers.
Contest of positioning
Many a Concours of positioning was born on Google, then on other engines. The goal is to place a page on a more or less fictitious Mot-clé in first position of the results of a search on this one. The first important contest related to the request SERPS . In 2004, a French-speaking competition based on the expression eater of stork gathered 170 candidates and reaches 420 000 requests on Google for this expression. Controversies took place on the motivations of these contests, which are for the ones of the tools for experiment useful for referencing, but which have according to the others only ludic motivations, making of Google a simple playing field.
Bombardment Google
The bombardment Google ( Google bombing ) consists in associating on the most possible Web pages an expression with a given Web site, so that a Google research on this expression goes up the site in question in the first results. Campaigns Google the bombardment are done through the forums or the Blog S, in incentive the Net surfers to be taken part. It is enough for the participant to add on a Web site or a blog a bond towards the site aimed by associating it with the expression.
One of the first sites to be aimed by a bombardment is that of the biography of the president of the the United States George Walker Bush on the site of the White House. A research google on the expression “failure” or “miserable failure” still gives this site like first result ().
During the autumn 2005, and making following a massive campaign of emails launched by the political party of Nicolas Sarkozy, and in reprisals, the Webmasters invited to make of Google Bombing on the name of the Minister of Interior Department. Thus, when you typed Nicolas Sarkozy in Google, you obtained in second position a bond pointing towards Iznogoud, the character of cartoon who wants to be a caliph in the place of the caliph. The Google Bombing consists in putting on the page of a Web site a bond (Iznogoud or George Bush) and to associate it with a text (Nicolas Sarkozy or miserable failure). If the operation is carried out by an unquestionable number of webmasters, the result is fast: the fallacious bonds go up in the first results of Google.
At the end of January 2007 Google announces to have developed an algorithm allowing to solve the problem of the " google bombing" and this in any language. From now on " poor wretch failure" reference on a page explaining the " google bombing".
Google fight
Google Fight consists in comparing the number of results returned by Google on several expressions: victorious the expression is declared having obtained the most result. The Net surfers thus have fun to compare names, political ideas, etc a Web site was even created to offer an interface to this type of " combat" .
Since January 2006, the team of Google intercepts the requests of Google Fight and returns whimsical results. You can check it by questioning the site several times of continuation on the same couple of names.
Google Whacks
Google Whacks is a play which consists in finding two words which associated in a research on Google give a single result. The terms employed must exist in the dictionary, and the found site should not be a simple list of words. The quotation marks and all punctuation marks should not be used. The score is often calculated by multiplying the number of results of the first term by the number of results of the second mot.
Limits and errors of Google
The principal limit of Google is that the engine traverses only the visible Web, leaving side all the databases professional, sometimes enormous, and often relevant, but whose access is limited (but sometimes free). Example: Dialog (15 000 Go).
Several studies published on the Web show serious internal limits of Google, like an article of base-publications (free consultation).
Size of the base
Several research showed that the number of really indexed pages would be only half of the announced number; other half would be pages visited by the robot of Google, but of which only a part (the heading, without the body of the page) would be indexed. These pages would be primarily not-english-speaking pages, because of the technology Adword , which is usable only for English, and who is the independent source of financing of Google.
This concept of size of the index was and remains an element major marketing of the search engines. End 2005, following a critical analysis, started in January 2005, size of its index, initiated by Jean Véronis, the Google firm decided not to put this argument more ahead. As example on this approach marketing, Google had announced a doubling of the size of its announced index, the shortly after the launching of MSN Search.
Effectiveness of research
At the time of a research of an average complexity (use of a Boolean operator, i.e. of a space ''' AND '''), the results vary the simple one with triple in the same day; in certain cases, according to an active order of magnitude from one to ten.
Sometimes, the search engine does not take account of the required operators.
This variability in the number of answers brought is explained by the architecture of Google. There exists indeed several waiter S dispersed in the world, lodging the index of the pages visited by Google. According to the place where a Net surfer is (or according to the local site of Google questioned), its request is directed towards one or the other of these waiters. Normally, each index is identical to the different one; but as they are not synchronized in real-time (but with intervals being able to exceed the month), only the principal index, located in California, are constantly up to date, and give a maximum of correct answers. The principal waiter can give ten times thus more answers than a secondary waiter.
Evaluation of the engine
According to Jean Véronis (“Comparative study of six search engines”, February 2006), Yahoo! and Google are the two best engines (among six of the French-speaking main motors). For the author, these two engines having equivalent performances, the reason of the massive preference of the Net surfers for Google is not the relevance of the results.
But, according to the Trent, it could be lower than Windows Live Search.
Controversies
Discusses on the influence of the contents of the posted results
While becoming the first search engine in term of use, Google became the first vehicle of information on the Internet. This role - to convey information - is inherent in the trade of the search engines and the problems which result from this are thus not all ascribable at Google, which is not the author of the contents of the pages.
Beyond the difficulties raised by the strategic importance of the classification of Google in the economic domain, the true problem lies in the strong ideological influence which the pages have which appear in the first results and which have the appearance of a word of Gospel. The popularity of an search engine such as Google can be used like vector of misinformation, where the influence of a site is all the more important as the keyword is popular and that it is top of the list. The leaders of Google acknowledge being impotent vis-a-vis the phenomena of intoxication and slandering which are currently posted in the first results of Google, the technique not being able to judge the sincerity of information.
Tiananmen business
The constrained leaders of popular China that a research on Tiananmen in Google Images returns photographs of tanks repressing the revolt coed, obtained from Google that the request " Tiananmen" on the Chinese gate of Google does not return any more these images.
This perhaps noted by comparing the same request on google.com and google.cn:
- http://images.google.com/images?q=tiananmen&hl=fr&btnG=Recherche+d%27images
- http://images.google.cn/images?q=tiananmen&hl=zh-CN&btnG=%E6%90%9C%E7%B4%A2%E5%9B%BE%E7%89%87
However, if one writes the same word with an approximate C-W communication ( Tienanmen for example), the engine of Google renvoit well photographs of tanks.
Caution: With the preceding orthography (Tiananmen), Google China (presented here in 2nd bond) posts photographs of tanks in the 4th page
Business BMW Germany
Following attempts of BMW Germany and its referencor, of increase in sound PageRank (and thus in the positioning of the bonds towards BMW on requests like car in Google), the automobile firm was blacklistée by Google which eliminated it from its index in January 2006. Research on " BMW" only references will bring back on its world site.
Business of the keywords in France
In 2005, the UMP and more particularly Nicolas Sarkozy were criticized to have bought tens of keywords like “riot”, “CPE”, “Jack Lang”… returning on the site of the UMP.
Discusses on the number of posted results
When the number of pages is too important, only 1 the first 000 pages are displayable, which is a limit reasonable and adopted by the majority of the search engines. However, certain Net surfers suspect that the number of found pages “is artificially inflated” when this limit is exceeded. This assumption is based on two facts:
- it posts sometimes a number of page more important than the number of pages of the fabric indexed by Google (for example with a request on a word used in the totality of the anglophone pages like “the”, Definite article);
- when one makes several successive research on the same keyword, the result varies. This result is explained by the numbers of waiters used by google, each waiter not having the same number of page recorded for the same result.
| Random links: | Pieter van der Aa | Élencourt | Pacy-on-Armançon | Cut Guanabara | Marcouille | Diana_de_Foix |