A grammar is a formalism making it possible to define a Syntaxe and thus a formal Language, i.e. a whole of words on a given alphabet.

The concept of formal grammar is particularly used in logical programming, compilation (syntactic Analyze), in theory of the Calculabilité and in the treatment of the natural languages (particularly with regard to their morphology and their syntax).

Languages

A language is a whole of words , which are simply sequences of symbols chosen in a unit (in general finished) called alphabet . Formally, if A is a unit, one notes A^* the free monoid on A, i.e. the whole of the finished continuations of elements of A, provided with the operation of Concaténation of two words. A language on the alphabet A is by definition a subset of A^*.

Often, the “symbols” that one considers when one defines a language by a formal grammar consist of several characters, so that they correspond rather so that one calls words in the current language. In the same way, the “words” of the language correspond rather to sentences or texts. When there is ambiguity, one speaks about letters or characters for the symbols of the alphabet used to code information; and one holds the word symbol for those of the abstracted alphabet, which are the basic elements of the language.

For example:

  • A1 = {has, B, C, D, E} is an alphabet containing 5 symbols, traditionally called letters in this precise case;
  • A2 = {2,5, @, $, &} is another alphabet containing 5 symbols;
  • A3 = {Det, Adj, Verb, Noun, Coord, Prep} is an alphabet of 6 symbols which can describe, for example, the syntactic structure of a sentence in a natural language.

Grammars

A formal grammar, or simply grammar, is made of a whole finished of final symbols (which are the letters or the words of the language), of a finished whole of not-terminals , of a whole of productions of which the members left and rights are formed words of terminals and not-terminals, and of a axiom . To apply a production consists in replacing its member of left by his member of right-hand side; the successive application of a certain number of productions is called a derivation. The language defined by a grammar is the whole of the only formed words of final symbols which can be reached by derivation starting from the axiom.

One notes usually the terminals by small letters, the not-terminals by capital letters, and the axiom by the letter S. Ainsi, the grammar defined by the terminals {has, B}, the not-terminal S, the rules of production

S → aSb
S → ε (where ε indicates the word empties)

and the axiom S represents the language of the words of the form a^n b^n (a certain number of “” (possibly 0 thanks to the rule S → ε), followed same number of “B”).

Hierarchy of Chomsky

See also: Hierarchy of Chomsky

When the linguist Noam Chomsky released the concept of formal grammar, he proposed of it a classification called nowadays Hiérarchie of Chomsky. It is formed of the four following levels, of most restrictive with broadest.

  • languages of the type 3, or rational languages: they are languages defined by grammar linear on the left (i.e. grammar whose each right member of rule starts with a not-terminal), grammar linear on the right (i.e. a grammar whose each right member of rule finishes by a not-terminal) or a rational Expression; or even languages recognized by a Finite-state machine.

  • languages of the type 2, or algebraic languages: they are the languages defined by a formal grammar out-context, or even the recognizable languages by a Automate with pile not determinist. The majority of the computer programming languages, without being strictly speaking algebraic languages, of it are rather close so that the techniques of analysis of the algebraic languages adapt to it.

  • languages of the type 1, or languages contextual: they are the languages defined by a contextual grammar, or the recognizable languages by a machine of Turing not-determinist to ribbon length limited by a fixed multiple length of the word of entry.

  • languages of the type 0, or languages Récursivement énumérable S. This unit includes all the languages defined by a formal grammar. It is also the whole of the acceptable languages by a Machine of Turing (which one authorizes to buckle on a word which is not language).

In addition to the four types of the hierarchy of Chomsky, there exist remarkable intermediate classes:

  • between 3 and 2: the languages out-contexts deterministic, recognizable by deterministic pile automat;
  • between 1 and 0: the recursive languages , i.e. recognizable by a machine of Turing (this one must refuse the words which are not language).

The six types above are strictly included one in another. Let us note that so in type 1, one transforms “not determinist” into “determinist”, one obtains a smaller type, but one cannot show if it is strictly included in type 1 or if it is equal to this one.

Analyzes

An analyzer for a formal language is a program Informatique which decides if a word given in entry belongs or not to the language, and possibly in built a derivation. One has systematic methods to write trace programs of the languages of the type 2 or 3 in the hierarchy of Chomsky. The Interpreter S or Compilateur S almost always include a phase of lexical Analyze, which consists in recognizing languages of the type 3, followed by a phase of syntactic Analyze which is an analysis of language of the type 2. The lexical analysis relates to a succession of characters and produces a continuation of lexemes , which are used for their turn of elements of the alphabet during the syntactic analysis.

Tools like Lex and yacc facilitate the writing, respectively, of lexical analyzers and parsers, by automatically producing portions of programs starting from a specification of this language. The manufacturers of parsers generally use an alternative of the Forme of Backus-Naur, which is a notation for grammars out-context; while the manufacturers of lexical analyzers employ the less heavy formalism of the rational expressions.

Examples of grammars

Arithmetic expressions

One can define arithmetic expressions in the following way:

exp:: = exp + exp | exp × exp | (exp) | num num:: = 0num | 1num | 2num | 3num | 4num | 5num | 6num | 7num | 8num | 9num | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

The not-terminals are here implicitly exp and num , the terminals are +, ×, (,) and the figures. The axiom is exp .

Following derivation is an example of use of this grammar.

exp → exp × exp → num × exp → num × exp → 3 × exp → 3 × num → 3 × 1num → 3 × 18

It is advisable to notice that the definition given makes it possible to recognize the expression arithmetic, but not to analyze them; Indeed, the priority of the operators is not returned by this grammar.

Simple computer programming language

To define a simple computer programming language is not very complicated. This grammar recognizes a computer programming language resembling Pascal. Here a sample program calculating fact (10) begin int has; int B; a:=10; b:=1; while (a>1) C b:=a*b; a:=a-1; od; print B; end

program:: = begin listinstr end listinstr:: = instr listinstr | instr instr:: = int id ; | id : = expr ; | print expr ; | while ( cond ) C listinstr od ; expr:: = expr - expr1 | expr1 expr1:: = expr1 * expr2 | expr2 expr2:: = id | num | ( expr ) cond:: = expr condsymb expr condsymb:: = > | < | >= | <= | ! = | =

terminals being id, num, begin, end, int, print, while, (,), C, od; , and symbols of comparison. Let us notice that with such a grammar (of type 2 in the hierarchy of Chomsky) one cannot check only all the variables were declared (for that one needs a grammar of the type 1). Rules would also have to be added for num (like higher) and for id.

Traditional propositional logic

The syntax of traditional propositional logic or Calcul of the proposals can be defined in the following way:

\ phi:: = (\ phi \ lor \ phi)|(\ phi \ Land \ phi)|(\ phi \ to \ phi)|\ neg \ phi|\ club-footed|\ signal|P|Q|\ ldots

P, Q,… are the propositional variables (terminals).

L-System

A L-System (or System of Lindenmayer) is a formal grammar.

Concept of equivalence

Strong equivalence

Two grammars are known as strongly equivalent if and only if
  1. They recognize the same languages exactly.
  2. They use the same syntactic tree exactly to analyze the same sentence.

Weak equivalence

Two grammars are known as slightly equivalent if and only if
  1. They recognize same the languages exactly.

Strongly equivalent grammars are thus always also slightly equivalent.

Random links:Jean Besson | Merry (Ardeche) | Taishan | Girl 6 | Happy Weber

© 2007-2008 speedlook.com; article text available under the terms of GFDL, from fr.wikipedia.org