See also: Research (homonymy)

A search engine is a Logiciel making it possible to find resources (Web pages, forums Usenet, images, Vidéo, etc) associated with unspecified words. Some Web sites offer an search engine like principal functionality; one calls then search engine the site itself (dailymotion youtube video google are video search engines).

Research tools on the Web made up of “robots”, still called spiders , crawlers or agents which traverse the sites with regular intervals and in an automatic way (without human intervention, which distinguishes them from the directory S) to discover new addresses (URL). They follow the bonds Hypertexte S (which connect the pages the ones to the others) met on each page reached. Each identified page is then indexed in a Database, accessible then by the Internaute S starting from key words.

By abuse language, one also calls search engines :

  • of the Web sites proposing of the directories of Web sites: in this case, in fact human resources index and classify Web sites considered to be worthy of interest and not of the robots of indexing - one can quote for example Voilà and Yahoo!, etc;
  • of the Software S installed on a Personal computer: in fact engines known as desktop combine research among the file S stored on PC and seeks it among the Web sites - one can quote for example Google Desktop and Copernic Desktop Search, etc
One also finds Métamoteur S, i.e. Web sites where the same research is launched simultaneously on several search engines (results being then amalgamated to be presented to the Internaute) - one can quote Mamma, Kartoo, Seek.fr, etc
More recently, one also finds directories which exploit systems of Folksonomie containing tags (or labels) positioned by the Net surfers.

Operation

The operation of an search engine breaks up into three principal stages.
  1. exploration or crawl : the Web is systematically explored by a Robot of indexing according to recursively all the Hyperlien S which it finds and recovering the resources considered to be interesting. Exploration is launched since a resource pivot, like a page of Annuaire Web.
  2. the indexing of the recovered resources consists in extracting the words considered as significant (practically all) correspondent with each resource. The extracted words are recorded in a database organized like gigantic a Dictionnaire reverses or, more exactly, like the terminological Index of a work, which makes it possible to find quickly in which chapter of the work is a given significant term. The nonsignificant terms are called empty Mots.
  3. research corresponds to the part requests of the engine, which restores the results. A secret algorithm kept is generally applied to give a variable weight to the correspondences, in order to be able to have the research results by order of supposed relevance. The algorithm generally takes account of the context of the keyword (title, paragraph, hyperlink…) and of the resource (bound resources, popularity of the site…)

Optimization of the search engines

In order to optimize the search engines, the Webmestre S insert Métaélément S (métatags) in the Web pages, in heading HTML (head). This information makes it possible to optimize the Recherches of information on the Web sites.

See also: Optimization for the search engines

Financing

The sites whose research constitutes the principal service can be financed with two sources: the publicity and the sale of technology.

Publicity

The search engines are financed mainly with targeted publicity. It is a question of presenting publicities corresponding to the words scanned for by the visitor. For the advertiser, that amounts buying key words: for example an travel agency can buy key words like “holidays”, “hotel” and “beach” or “Cannes”, “Antibes” and “Nice” if it is specialized in this area.

The search engine can post the publicity in two manners: in separate insert or integrating it into the results of research. For the visitor, the separate insert is presented in the form of a traditional publicity. Integration with the results is made on the other hand with the detriment relevance results and can have negative repercussions on the perceived quality of the engine. So all the engines do not sell placement in the results.

Sale of technology

The large organizations (undertaken, administrations) generally have very many computer's resources in vast a Intranet. Their resources not being accessible since Internet, they are not covered by the search engines of the Web. They must thus install their own engine if they want to undertake research in their resources. They thus constitute a market for the developers of search engines.

It also happens that public Web sites use the services of an search engine to pack their offer. Thus Yahoo!, specialist in the Directory Web, used during a few years the technology of Google for research, until it launches its own search engine Yahoo Search Technology in 2004, whose foundations come from AltaVista, Inktomi and Overture, companies founders of the search engines and repurchased by Yahoo!.

Evolution towards the semantic Web

Insofar as the producers of Contenu index the bases with Métadonnée S or Taxinomie S (ontologies), the search engines will have to adapt to the Semantic analyzes. Compared to the Research full text, research in the semantic Web is much more effective.

Some examples of semantic search engines:

  • CORESE, developed with the INRIA
  • KartOO and Ujiko
  • Lingway km is a linguistic and semantic platform multilingual allowing the development of specialized search engines
  • Sinequa CS of Sinequa
  • Zoom, of Acetic
  • Pertimm

Main motors of research

According to a study of the Comscore cabinet realized in August 2007:

  • Google (approximately 60% of the 61 billion Internet search)
  • Yahoo (8,5 billion research, is 14% of the total)
  • Sharelook search engine, inter alia just in Lycos
  • Baidu, " Google chinois" who goes up in power (3,3 billion requests, is 5,4% of the total),
  • Live Search, search engine of Microsoft (2,1 billion research, 3,4%)
  • Naver, search engine Korean of group NHN (2 billion research).
  • the site of trade EBay (1,3 billion research).

See too

Related articles

External bonds

  • How to determine the reliability of information found on Internet

Random links:Salyut 6 | Agathias | Tengger (Indonesia) | François Saulnier de Saint-Jouan | Open Languedoc-Roussillon

© 2007-2008 speedlook.com; article text available under the terms of GFDL, from fr.wikipedia.org