See also: ROCK , OCR

The optical character recognition ( ROCK ) can be carried out only by data-processing processes, with the notable exception to be it human which, carries out to him, in addition to the recognition, the comprehension of the message, its memorizing, even its critical Analyze in one time.
Un computer claims for the execution of this task a Logiciel of optical character recognition , ROCK or OCR (abbreviation of the English term optical character recognition ), This one makes it possible to recover the Texte in the image of a printed text and to safeguard it in a file which can be exploited in a word processing for enrichment, and stored in a database or at least, on a sure and exploitable support by a computing system.

Short history of the ROCK

In 1950, Frank Rowlett, which had broken the diplomatic code Japan board PURPLE, required of David Shepard, a cryptanalyste of the AFSA (predecessor of American NSA ), to work with Louis Tordella to make at the agency procedural proposals of automation of the data. The question included the problem of the conversion of messages printed into machine language for the data processing. Shepard decided that it was to be possible to build a machine to do it, and, with the assistance of Harvey Cook, a friend, “Gismo” in its attic built during its evenings and its weekends. The fact was reported in the Washington Daily News of April 27th 1951 and in the NewYork Times of December 26th 1953 after the deposit of the Brevet number 2  663  758. Shepard then founded Intelligent Machines Research Corporation (IMR), which delivered the first systems of ROCK to the world exploited by privately held companies. the first private system was installed with the Reader' S Digest in 1955, and, of many years later, was offered by the Readers Digest to the Smithsonian, where it was put in exposure. Standard Oil of California for the reading of Credit card imprints at ends of invoicing, with many more systems sold to other oil companies. -->Les other systems sold by IMR at the end of years 1950 included/understood a reader of form of invoicing to the Ohio Bell Telephones Company and a digitizer (scanner of documents) with the US Air Force for the reading and the transmission by telex of typed messages. IBM and others used the patents of Shepard later.

Since 1965, the Poste of the United States uses to sort the mail of the machines ROCK whose principle of operation was imagined by Jacob Rabinow, a prolific inventor. La Poste Canadian uses systems ROCK since 1971. The systems ROCK read the name and addresses of the recipient to the first automated sorting office, and print on the envelope code-bars based on the zip code. The letters do not have any more but to be sorted in the following centers by less expensive sorters which have only to read the code-bars. To avoid any interference with the readable address which can be anywhere on the letter, a special ink is used, which is clearly visible under a light UV. This ink seems orange under normal conditions of lighting.

Training

The first systems needed a “training” (collection of samples known for each character) to read a Font face given. But today, it is current to find systems “intelligent” which can recognize the majority of the police forces with an high level of precision.

Operation

  • One leaves the Digital image realized by a optical scanner of a page (printed document, typed layer, etc) which one wants to recover the text.
  • the image must be enough contrasted so that the program distinguishes the characters easily. Certain software comprises, moreover, an interface for the numerical acquisition of the image.
  • the principle of the program is of to read the document and, thanks to libraries of forms, to detect the characters in order to make some correspond the form to the awaited character.
  • Of the dictionaries makes it possible to recover certain errors since the software will be based then on existing words to make its comparisons.
  • Certains software even will try to preserve the enrichment of the text (body, grease and police force) as well as the page layout, to even rebuild the tables.

Field of research

A particularly difficult problem for the computers and the human ones is that of the old religious registers of the baptisms and the marriages, which contain especially names, where the pages can be damaged by time, water or fire, and the names can be obsolete or writings according to old C-Ws communication. The data-processing techniques of image processing can help the human ones in the reading of extremely difficult texts, like the Palimpseste of Archimedes or the Manuscrits of Qumrân. Co-operative approaches where the computers assist the human ones and vice versa constitute a field of research interesting.

The character recognition is an active field of research for computer science since the end of the year 1950. Of the beginning, one thought that it was about an easy problem, but it appeared that it was about a subject much more interesting. It will still take many decades for the computers, if they reach one day that point, to read all the documents with the same precision as the human beings.

Some software of optical character recognition

Commercial software

  • Cogestar, Editor of production equipments based on the OCR
  • FineReader de Abbyy, Russian company, world leader of the software of OCR
  • Readiris of I.R.I.S., Belgian society, software precise, OCR in Arabic, Persan, Hebrew and languages Asian, very complex integration.
  • NEOPTEC Software publisher, automatic Acquisition of data per scanner
  • OmniPage de Nuance (ex Scansoft), american company
  • CVISION PdfCompressor american company, very precis
  • Intelliant OCR of Intelliant (France), based on Tiger OCR
  • Bit-Alpha of Office Engineer Tomasi (France)
See comparative detailed 01net

Free software

  • Stuart Inglis-->
  • Clara OCR
  • DocMgr (Unix)
  • FreePress (Windows)
  • GOCR (Unix, Windows)
  • Ocrad (Unix)
  • Ocher (Unix)
  • OCRopus (Unix)
  • Tesseract (Unix, Windows)
  • Gamera (Unix, Windows)
  • Gratuiciel S

    • SimpleOCR (Windows)

    Random links:Squash | KV2 | Gaspard Ernouf | Route main road 706 | Kidney

    © 2007-2008 speedlook.com; article text available under the terms of GFDL, from fr.wikipedia.org