One names inference bayésienne the logical step allowing to calculate or revise the probability of an assumption. This step is governed by the use of strict rules of combination of the probabilities, of which drift the Théorème of Bayes. From the point of view bayésienne, a Probabilité is not interpreted like the passage in extreme cases of a frequency, but rather like the numerical translation of a state of knowledge (the degree of confidence granted to an assumption, for example; to see Theorem of Cox-Jaynes).

Jaynes used on this subject with its students the metaphor of a robot to inductive logic . One will find a bond towards one of his writings in the article Artificial intelligence.

The handling of the probabilities: logical notation and rules

The inference bayésienne is founded on the handling of probabilistic statements. These statements must be clear and concise in order to avoid any confusion. The inference bayésienne is particularly useful in the problems of induction. The methods bayésiennes are distinguished from the methods known as standard by the systematic application of formal rules of transformation of the probabilities. Before passing to the description of these rules, we with the notation employed familiarize.

Notation of the probabilities

Let us take the example of a woman seeking to know if it is pregnant. One will define initially an assumption E: it is pregnant, of which one seeks the probability p (E) . The calculation of this probability passes obviously by the analysis of the test of pregnancy. Let us suppose that studies showed that for expectant mothers, the test indicates positive 9 times out of 10. For the women not-enclosures, the test indicates negative in a ratio of 19/20. If the assumptions are defined:
  • TP: the test is positive,
  • TN: the test is negative,
one can interpret the preceding results in a probabilistic way:

The probability of the TP assumption knowing that the woman is pregnant is of 0,9.

In language of the probabilities, this statement will be described by the expression p (T_P|E) = 0,9. Same manner p (T_N|\ bar {E}) =0,95 means that the probability that the test is negative for a woman who is not pregnant ( \ bar {E} ) is of 0,95. Notice that one follows here the convention according to which a statement or a certainly true assumption has a probability of 1. Conversely, a certainly false statement has a probability of 0.

In addition to the conditional operator |, the logical operators And OR have their particular notation. Thus, the simultaneous probability of two assumptions is noted by the sign \ cap. The expression p (E \ course T_P) thus describes the probability of being pregnant AND of obtaining a positive test. Lastly, for the logical operator OR, a sign \ cup is generally used. The expression p (E \ cup \ bar {E}) thus means the probability that the woman is pregnant or not. Clearly, according to preceding convention, this probability must be 1, since that it is impossible to be in a state other than pregnant or not enclosure.

Rules of the logic of the probabilities

There exist only two rules to combine the probabilities, and from which all the theory of the analysis bayésienne is built. These rules are the rules of addition and multiplication.

the rule of addition p (has \ cup B|C) = p (has|C) + p (B|C) - p (has \ course B|C)

the rule of multiplication p (has \ course B) = p (has|B) p (B) = p (B|With) p (A)

The Théorème of Bayes can be simply derived by making profitable symmetry from the rule of multiplication p (has|B) = \ frac {p (B|With) p (A)} {p (B)}.

The theorem of Bayes allows to reverse the probabilities . I.e. if the consequences of a cause are known, the observation of the effects makes it possible to go back to the causes.

In the preceding case of the expectant mother, knowing the result of the test, it is possible to calculate the probability that the woman is pregnant by using the theorem of Bayes. Indeed, in the case of a positive test, p (E|T_P) = \ frac {p (T_P|E) p (E)}{p (T_P)}. Notice that the inversion of the probability introduces the term p (E) , the prior probability to be pregnant, often called the prior . The prior describes the probability of the assumption, independently of the result of the test. A woman who uses means of contraception would choose a p (E) very weak, since it does not have reason to believe that it is pregnant. On the other hand, a woman having had recently sexual relationships not-protected and suffering from frequent vomiting would adopt a higher prior. The result of the test thus is weighed, or moderated, by this estimate independent of the probability of being pregnant.

It is this estimate a priori which is systematically ignored by the standard statistical methods.

Obvious notation

This notation is often allotted to I.J. Good. This last however allotted paternity of it to Alan Turing and, independently, to other researchers of which Jeffreys .

In practice, when a probability is very close to 0 or 1, it is necessary to observe elements considered themselves as very improbable seeing it changing. For better fixing the things, one often works in Décibel S (dB), with following equivalence:

Ev (p) = 10 log10 p (1-p).

A probability of -40 dB corresponds to a probability of 10-4, etc the interest of this notation, in addition to which it prevents handling too many decimals in the vicinity of 0 and 1, is that it also makes it possible to present the rule of Bayes in additive form: one needs the same weight of testimony ( weight off obviousness ) to make pass an event of a plausibility of -40dB (10-4) to -30dB (10-3) that to make it pass from -10dB (0,1) to 0dB (0,5), which was not obvious by keeping the representation in probabilities. The following table presents some equivalences:

Ev is an abbreviation for weight off obviousness , sometimes translated into French by the word obviousness ; the formulation in conformity with the English expression of origin would be the word for word weight of testimony , but by an amusing coincidence " évidence" shows itself very suitable in French for this precise use.

They is shortly after the publications of Jeffreys which one discovered that Alan Turing had already worked on this question by naming the corresponding quantities log-odds in its personal work.

Comparison with the traditional statistics

Difference in spirit

A difference between the inference bayésienne and the statistics traditional, known as as frequentists, indicated by Myron Tribus, is as

  • the methods bayésiennes use impersonal methods to update personal probabilities , known as also subjective (a probability is in fact always subjective, when his bases are analyzed),

  • the statistical methods use personal methods to treat impersonal frequencies .

The bayésiens thus make the choice model their waitings at the beginning of process (even if it means progressively to revise this first judgment with the ell of the experiment of the observations), while the traditional statisticians set a priori a method and an assumption arbitrary and treated the data only then (what had all the same the merit to reduce calculations well).

Methods bayésiennes, because they did not require that one sets preliminary assumption, opened the way with the automatic Data mining; it is not necessary indeed more with them to have recourse to a human Intuition precondition to imagine assumptions before being able to start to work.

When to use one or the other?

The two approaches are complementary, the statistics being in general preferable when information is abundant and of low costs of collection, the bayésienne if they are rare and/or expensive to gather. In the event of great abundance of data, the results are asymptotically the same ones in each method, the bayésienne being simply more expensive in calculation. On the other hand, the bayésienne makes it possible to treat cases where the statistics would not have enough data so that one can apply the limiting theorems of them.

The psi-test bayésien (which is used to determine the plausibility of a distribution compared to observations ) is asymptotically convergent with the χ ² of the traditional statistics as the number of observations becomes large. The apparently arbitrary choice of a Euclidean distance in the χ ² is thus perfectly justified a posteriori by the reasoning bayésien (source: Myron Tribes, COp cit. )

Examples of inference bayésienne: from which does this cookie come?

Let us imagine two cookie boxes.

  • One, has, comprises 30 cookies with the chocolate and 10 ordinary.

  • the other, B, comprises of them 20 of each.

One chooses the closed eyes randomly a box, then in this box a cookie randomly. It is being with the chocolate. Of which box it is likely the most to result, and with which probability? Intuitively, one suspects that the box has is likely more to be the good one, but of how much?

The exact answer is given by the Théorème of Bayes:

Let us note H A the proposal “the cake comes from the box has” and H B the proposal “the cake comes from the box B”.

If when one has the bandaged eyes the boxes are characterized only by their name, we have P ( H A) = P ( H B), and summons it makes 1, since we chose a box well, that is to say a probability of 0,5. for each proposal.

Let us note D the event indicated by the sentence “the cake is with the chocolate”. Knowing the contents of the boxes, we know that:

  • P ( D | H A) = 30/40 = 0,75
  • P ( D | H B) = 20/40 = 0,5.

Note : “P ( HAS | B )” says “the probability of has knowing B ”.

Resolution using the notation of the probabilities

The formula of Bayes thus gives us:

\ begin {matrix} P (H_A | D) &=& \ frac {P (H_A) \ cdot P (D | H_A)}{P (H_A) \ cdot P (D | H_A) + P (H_B) \ cdot P (D | H_B)} \ \ \ \ \ & =& \ frac {0,5 \ times 0,75} {0,5 \ times 0,75 + 0,5 \ times 0,5} \ \ \ \ \ & =& 0,6 \ end {matrix}

Before looking at the cake, our probability of having chosen the box has was P ( H A), that is to say 0,5.

After look athaving looked at it, we revise this probability with P ( H A| D ), which is 0,6.

Resolution using the notation weight off obviousness

August 1st

References

Teaching of the tool

  • Bernardo, J. and Smith, A.F.M. (1994) Bayesian Theory. John Wiley, New York (the reference of the formal approach of the theory bayésienne via the functions of loss and the decision theory)
  • Tribes, Myron (1974) rational Decisions in dubious the , transl. of Jacques Pézier, Masson (exhausted, but readable with the Public library of information)
  • Robert, C.P. (1992) the Statistical analysis Bayésienne . Economica, Paris
  • Documentation and programs to download
  • Robert, C.P. (1994). The Bayesian Choice: With Decision Theoretic Motivation . New York: Springer Verlag (first edition, in French: the Statistical analysis Bayésienne , Paris: Economica, 1992, but less neat typography and thus less large legibility. Translated into French in 2006 by Springer-Verlag, Paris)
  • Jaynes, E.T. (2003) '' Probability Theory: The Logic off Science '' (in English).

Use of the tool

  • David Bellot (2002) Inference bayésienne in practice
  • Good, I.J. (1963) Speculations Concerning the First Ultraintelligent Machine (see also Artificial intelligence)
  • Work of the ERIS at the University of Rouen
  • Myron Tribes (1974) rational Decisions in dubious the (exhausted, but freely consultable with the library of Beaubourg, comprising many examples and programs BASIC the resolvent in way practices)

The works relating to the use are rarer for the following reason: one uses the methods bayésiennes where information is expensive to obtain (oil prospection, search for drugs…). It is in the two quoted cases of the privately held companies (tankers, pharmaceutical laboratories…) who finance them, and those do not have vocation with to give to their competitors information which was expensive their shareholders.

It should however be noted that analyzes bayéesiennes of concrete problems off appear in the majority of the numbers of the large newspapers of Statistics like Journal the Royal Statistical Society , Journal off the American Statistical Association , Biometrika , Technometrics or Statistics in Medicine .

Engines of inférences

  • ProBT® in version commercial at ProBAYES and free for research and teaching on the site Bayesian-Programming.org

Historical appendix

The use of prior probabilities involved some recurring reproaches with the methods bayésiennes during their introduction. One was to then point out the four following points systematically:
  1. the effect of the distribution a priori grows blurred as the observations are taken into account

  2. It exists impersonal laws , like the maximization of entropy or the invariance of group indicating the single possible distribution without to add information specific to the experimenter.
  3. the probabilities a priori are often in other methods used unconsciously (criterion of Wald, minimax criterion…)
  4. As for all other models, the effects of various choices a priori can be considered of face.

These methods passed today in manners.

See too

Random links:Euglenozoa | Cluse | Marginal productivity | Andre Fougeroux de Secval | Kóstas Mítroglou