As its name indicates it, in data processing, the correct is a software tool making it possible to correct text. One should not confuse the orthographical correct and the grammatical correct (also called grammatical inspector ). The orthographical corrector compares the words of the text with the words of the dictionary. If the words of the text are in the dictionaries, they are accepted, if not one or more proposals for close words are made by the orthographical corrector. The grammatical corrector checks that the words of the text, although they are in the dictionaries, are in conformity with the grammatical rules (agreements, word order, etc) and with the rules of semantics (sentence having a direction, absence of confusion of homophons, etc).
This perhaps autonomous software and to function on a block of rough text, but the function of correction is often integrated in the software or the user is brought to access text. It was it at the beginning in the software of word processing and today on the Toile, in the forums, the managers of email.
Specificities of the languagesEnglish is an exception in the sense that the majority of the words used with the writing have only one writing which can be found in a standard dictionary, except for certain jargons and modified words. However in much of languages the words are frequently brought to adapt their orthography to the close words. For example, in French the word " je" follow-up of any word starting with a vowel is always written in its contracted form, as in " I ai" or " I irai" . In German, the made up names are often drawn from other existing names. Certain writings clearly do not separate the words from/to each other, which requires algorithms of separation of words. Each language can thus present challenges distinct for the orthographical correctors from not-English languages.
Sensitivity to the contextLast research was focused on the development of algorithms able to recognize a word badly written, even when the word is in the dictionary, while being based on the context of the surrounding words. That moderates the effect disastrous related to the extension of the dictionaries, which makes it possible more words to be recognized. The most common example like error that this system can detect is the error of homonymy S, as in the following sentence:
Île its mow for they are there.
The most effective algorithm of the moment is that of Andrew Golding and daN Roth '' Winnow-based '' spelling correction algorithm, published in 1999, which is able to recognize approximately 96% of the errors related to the context, in addition to detections the non-motive ones (compared to the dictionary). The last versions of the grammatical correctors are sensitive to the context. Thanks to the analysis of large corpus, they have the usual context of the most current words and can thus correct faults of homonymy (like “coward” and “releases” or “wind” and “van”) with a good precision. It is the case of Antidote, which analyzed a corpus of 500 million words, it is also the case of Cordial, which analyzed a corpus of 1 billion 200 million words.
|Random links:||BBC | XIIe century | The King sleeps | Flight 1145 Sosoliso Airlines | Epoxy resins (occupational disease) | Allowance of solidarity to the elderly|