The algorithm of Viterbi , of Andrew Viterbi, makes it possible to correct the errors which have occurred during a transmission through a disturbed channel (to a certain extent).
Its use is based on the knowledge of the disturbed channel (probability that information was modified in another) and makes it possible to simplify radically the complexity of the research of the most probable message of origin (of exponential, this complexity becomes linear).
The purpose of this algorithm is to find the sequence of states most probable having produced the measured sequence.
That is to say a message m diffused through a known disturbed channel (for example, which removes all the accents of a text) and the message O observed at exit of the channel.
To find the message of origin, one seeks, starting from O the most probable message. The rough method consists in testing, for each letter of O the most probable letter in m by putting forth the assumption that the disturbed channel did not add nor not removed information.
One obtains a tree of important size, if N is the size of the message and has the number of possible modifications for each unit (each letter for example), the complexity of the treatment is of (what makes the problem incalculable for great quantities of data).
The algorithm of Viterbi proposes to simplify the tree progressively its construction. Indeed, during his unfolding one finds oneself quickly with branches proposing same substitutions, but with different probabilities: it is not necessary to unroll those of weaker probability since they cannot be candidates any more to describe the most probable message. This simple principle is known under the name of dynamic Programming.
In a more general way, the algorithm of Viterbi is used in the cases or the subjacent process is modélisable as a discrete Chaîne of Markov in finished states. The problem is then, being given a sequence of observation , to find the sequence of states for which the probability a posteriori is maximum. The algorithm of Viterbi gives the solution of this problem of estimate per maximum a posteriori.
The most current application is probably the restoration of digital transmissions, deteriorated through an unreliable channel (hertzian transmission for example) while resting on the Distance of Hamming in order to emphasize weakest metric between the various probable values. In general, this method applies to many problems based on statistical evaluations of the relevance of the results (abundantly used in automatic Traitement of the languages for example, cf following example).
One of the headlights applications of the algorithm is also the estimate of the sequence of states (hidden) most probable having been generated by a Modèle of Markov hidden (problem of decoding). The Voice recognition and the Bio-data processing abundantly use the model of Markov hidden and are thus also fields where the algorithm of Viterbi finds an application.
Let us imagine a disturbed channel which removes all the accents of a text. Starting from the message m : “University” one observes the message O : “University”.
Several messages could lead to this observation:
One can know the probability of each letter while being interested in his probability of appearance in the French language and one can refine this probability by seeking the probability that a letter appears whereas it is preceded by another (and so on), one speaks about unigrammes, of bigrammes, trigrams…
One builds the following tree (with the bigrammes, P (X) corresponds to the probability of seeing appearing X, P (Y/X) is the probability of having Y knowing that one had X -- cf image).
The probability of a branch is the product of the probability of all its nodes. With each node, one considers the probability of the unigramme (P (X)) and of the bigramme (P (X/Y)). In practice these probabilities are balanced and standardized and one obtains for each node the value with ( N corresponding to the degree of desired N-gram), since one is interested in a probability, therefore lain between zero and one (for the example, the values and seem optimal).
While looking at the tree, one realizes quickly that the probabilities of the third stage (for the character observed “I”) are identical and depend only on the preceding stage. In the same way, the under-trees of these nodes will be identical, it is thus useless to unroll the under-trees under the branches of weaker probability since one is sure that the total probability of the generated branches will be weaker .
For example, let us suppose that we obtained like probabilities on the third floor of our tree:
P (PLAIN) = 0.8
The under-trees in lower part of the I of LINKED and in lower part of the I of ÙNI are strictly equivalent, in the same way for the under-trees in lower part of the I of LINKED and the I of ÙNÎ. There will be thus the following probabilities:
P (PLAIN…) = 0.8 * P (Under-Tree (Ni)) (1)
and
P (PLAIN…) = 0.15 * P (Under-Tree (Ni)) (2)
Whatever the probability of each under-tree, one realizes that some will be weaker than others quickly, for example the branch (4) will be lower than the branch (2) and connects it (3) lower than the branch (1) . One can thus prune the tree by removing them and continue calculation on the remaining branches (suppression of half of the possibilities to each stage).
| Random links: | Ignacio Fernández Lobbe | Willie Moretti | Gonadotrope | Christine Keshen | Valley of Bada |