Nutch
Nutch is an initiative aiming at building a Search engine Open source. It uses Lucene as library of search engine and indexing. On the other hand, the robot of collection was created specifically for this project.
The architecture of Nutch is highly modular and makes it possible developers to create plugins for various phases of the process: recovery of the data, analyzes documents, seeks, etc
Doug Cutting is the initiator and the coordinator of this project.
It is entirely developed in Java, but the data which it handles are in a format independent of any computer programming language. In June 2003 was presented an operational version of a demonstration of Nutch on a basis gathering 100 million documents.
History
The government of Quebec adopts Nutch
In December 2006, the government of Quebec chose Nutch like search engine for the location of the whole of its sites according to a preselection. To date, more than 400 sites and 500.000 documents are indexed. This migration was carried out by DocuLibre, a firm of Quebec, in less than 30 days.
Oregon State University passes to Nutch
Since September 2004, Oregon State University replaced its pole of research Google by Nutch. That enables him to carry out significant cost reductions and to promote the transparency of this search engine. This reduction is estimated at 100.000 $ per annum according to Open Source Lab.
CreativeCommons.org appuye on Nutch
Creative Commons inaugurates in 2004 beta version of its search engine which traverses the Web in the search of text, its audio and video, indexing on this date a million page; all that reusable freely according to the terms of the licenses placed at the disposal on their Web site.
Their search engine rests the Resource Description Framework (RDF) which uses the meta-language XML, standardized by the World Wide Web Consortium (W3C).
This exit coincides with that of the navigator Web Mozilla Firefox in its version 1.0 consequently making possible the free search for contents.
Nutch joined Apache
In January 2005, Nutch is a two years project of age which initially was lodged by Sourceforge and was supported by its own non-profit organization. This organization was founded with an aim of giving a copyright to the project and of being able to keep the right to change the license. The team decided that the license Apache was adapted the most for Nutch and that it did not need more the assistance of an external organization. The leaders and the developers are now supported by the foundation Apache.
After five months of incubation, Nutch becomes a sub-project of Lucene.
Evaluation of the engine
Published on June 1st 2004, the study undertaken by Lyle Benedict presents a comparison of the results of famous the Google and its free counterpart Nutch within the restricted framework of Internet site of the University of the State of Oregon on a basis of 100 requests (). For example, on notes going from 0 to 10 where 10 is the best note, it found 28 requests by lequelles Nutch and Google obtained the maximum note.
Contributions
The contribution are based on the merit and the Karma. The contributors owe inscire with a Mailing list to know which does what and to send a short email informing the others of what they will make. When work is finished, the end of code is soumi to the mailing list (or attache with a report/ratio of bug) so that each contributor can examine his quality and his relevance ().The criteria of acceptances are:
- an high-quality (of the code);
- facilitated reading;
- facilitated integration;
- coherence with the objectives of Nutch.
If all is correct, the end of code is inserted by the developers in the base of the sources and he becomes integral part of Nutch.
See too
Related articles
| Random links: | Búsqueda linear | Catherine d' Aragon | Administrative law | Factor of transcription | Sirima | Billy Cleaver | Bø,_Telemark |