A thesaurus or thesaurus is a kind of Dictionnaire arranged hierarchically; a vocabulary standardized on the basis of of generic terms and narrower terms to a field. It provides only incidentally definitions, the relations of the terms and their choice overriding the significances.

Notice on the orthography: thesaurus is a " word savant" directly borrowed from Latin, and of this fact should not be accentuated in theory. The two orthographies thesaurus and thesaurus are allowed by the dictionaries, but the francized form seems most frequent in the literature. Latin plural thesauri is sometimes employed, but passes for an obsolete form or oddly an Anglicism (English employs Latin plural). Coherence wants of course that one writes either a thesaurus, thesauri , or a thesaurus, thesauri . In this article one adopted the accentuated form.

Tool for indexing and research tools

A thesaurus is a structured whole of terms chosen for their capacity to facilitate the description of a field and to harmonize the communication and the data processing about it. Each term called descriptor is as not very ambiguous as possible and is preferred in the nearby terms (quasi-synonymy) or synonymous, the not-descriptors , for all the significant exchanges.

In practice, the thesaurus is a documentary tool of Indexation. Guided by a relevant thesaurus, it is possible to represent any document by a rigorous selection of precise words, called key words. It will be then easy to ensure of it an arbitrary form of document management.

In mode consultation and exploitation of the data, the thesaurus becomes an instrument of research: having the vocabularies and rules of the indexing, the user can optimize his requests.

Constitution of a thesaurus

A thesaurus is worked out like a subset of the usual vocabulary and at least a specialized vocabulary. It is about a Vocabulaire controlled since he results from a long process of sorting of the words, names and expressions used in an abstract way in a particular field. It is about a pragmatic and continuous step of rationalization of the descriptive terms. A new thesaurus or a new version must generally undergo a phase of validation by the community concerned.

Systems of automatic treatment of texts (automatic indexing) allow the extraction of the most frequent terms of a corpus and to a certain extent facilitate the emergence of their semantic relations.

For the best adequacy with the field considered, the terms are inventoried, compared, connected and are finally treated on a hierarchical basis to give an account of the essential features of the field. This hierarchy is based on a typology: each term belongs to a category which locates it compared to all the other terms selected and which fixes this manner its priority of employment. The hierarchy of the terms can completely be different from a thesaurus to another and even subject to inconsistency in a use or another of the same thesaurus.

Finally on the basis of the level highest and corresponding to the field of the thesaurus, one finds initially the subdivisions major representing the components of the field - subdivisions often named microthesaurus , then for each subdivision, the hierarchy specific to the descriptors. A thesaurus can also relate to several fields.

It remains always an arbitrary dimension in the hierarchy of a thesaurus, either in the choice of the terms, or in their hierarchical position.

There exist standards for the development of the thesauri:

  • ISO Standard 2788-1986: Guiding principles for the establishment and the development of the monolingual thesauri.
  • ISO Standard 5964-1985: Guiding principles for the establishment and the development of the multilingual thesauri.
  • SKOS : Specification in language RDF developed by W3C, for the publication and the use of the thesauri within the framework of the semantic Web

Hierarchical relations

The terms of a thesaurus are organized hierarchically (inside microthesauri often classified alphabetically). This hierarchy makes it possible to regulate the precision of the indexing or the interrogation. The indexing will be based as much as possible on the identification of the specific terms (thus of the level low possible), whereas research according to the cases can call upon the generic terms to increase the number of answers.

The relations of the terms are of three types:

  • hierarchical relation strictly speaking (between descriptors), bases hierarchy of the thesaurus;
  • relation of equivalence (between descriptors and not-descriptors), bases univocity;
  • association relationship (between descriptors), semantic enrichment ; prone related .

Any thesaurus comprises at least three categories of terms: generic terms and the narrower terms which must be used as descriptors; the equivalent terms which are regarded as not-descriptors according to conventions of the thesaurus.

*Les generic terms is generally located by the initials TG ; they indicate the entities or principal concepts in reference to the other terms and the field considered;

*Les narrower terms is generally located by the initials TS ; they specify and identify the particular entities or concepts inside the semantic field of a given generic term;
*Les equivalent terms is generally located by the initials EP like abbreviation of Employé For ; they are alternatives of the narrower terms (Synonymie or quasi-synonymy). They equivalent in the language running, but are thus given for subsidiary in the use of the thesaurus. The term to be preferred at the end Employé For is indicated by the symbol EM or EMP like abbreviation Employer .

One also finds very generally the associated terms identified by MT (association relationship: causality, localization, relations of temporal nature, composition, etc). Being themselves of the descriptors, these related terms make it possible to the researcher to gradually modify his interrogation or to widen it without calling upon the generic terms.

Various types of relations and complementary headings can be assistant with this basic structure to enrich the thesaurus or to improve its use. One can in particular envisage equivalent linguistics for multilingual thesauri as well as footbridges with other thesauri of the same field or fields different.

Elementary example of thesaurus

Are the principal headings of a micro thesaurus on a collaboratif computing system:

* Individuals >

* Software >
* Network >
* Resources >

The heading Individus would be composed for example of:

* Reader (TG);

* Taking part (TG); Author (EP); Contributor (EP);
* independent Editor ( TS ); Anonymity (EP); Addresses IP (metaphorical form to avoid);
* registered Editor ( TS );
* Taking part elected (TG);
* Administrator ( TS ); Sysop (usual term in the community)
* Administrative ( TS );
* Representative ( TS ) (in charge of the foreign relations);
* User (vague term: to proscribe); Net surfer (vague: to proscribe).

The person in charge of any contribution could thus be specified by at least a selected descriptive term among the five narrower terms or the three generic terms, according to the needs. Terms (EP) by principle will be avoided in the indexing, but could be used later on to exploit exclusively such or such type of contribution without rigorously employing the clean terms of initial description.

Modes of presentation

Whatever its support, a thesaurus uses usually presentations by alphabetical classification of its terms; first stage before the presentation of the hierarchical relations. Thus, the user can it be diverted initially by the absence of a term in a list, whereas another method of use of the thesaurus reveals to him that this term is well taken into account but by relation with one of the privileged terms. Presentations in the form of graphs and charts allow more complex explorations.

The use or exploration of a thesaurus can usually be done using several modes of presentation:

  • List (S) alphabetical (S) of the terms; for an comprehensive approach or the search for a particular term;

  • List (S) hierarchical (S) of the terms; for the deepening of a concept;
  • List (S) of occurrences (permuted list); for the checking of the relevance of an element of an expression used like descriptor;

One can find in these lists the symbol MT indicating the microthesaurus which the term concerns.

Optional elements of a thesaurus

One finds associated with the descriptors, of the definitions (case of Homonymie), of the notes assisting the user (notes), of the bonds of any nature, etc

See too

External bonds

  • Thesaurus of ethics of the life sciences
  • Synomizer! - the analysis of the words and the texts in five languages supports.
  • Motbis Thesaurus (thesaurus for Motbis education)

Simple: Thesaurus

Random links:Foug | Park of Yerres | Álvaro Arbeloa | Albator 84 | ATU Steinhardt

© 2007-2008 speedlook.com; article text available under the terms of GFDL, from fr.wikipedia.org