The theorem of Bayes is a result basic in Theory of probability, resulting from work of the reverend Thomas Bayes and found then independently by Laplace. In its single article, Bayes sought to determine what one would currently call the distribution a posteriori of the probability p of a Binomial distribution. Its work was published and presented on a purely posthumous basis (1763) by his/her friend Richard Price in a test to solve a problem in the theory of the risks ( An Essay towards solving has Problem in the Doctrine off Chances ). The results of Bayes were taken again and extended by the French mathematician Laplace in a test of 1774, which was apparently not with the fact of the work of Bayes.

The principal result (Proposal 9 of the test) obtained by Bayes is the following: by considering a uniform distribution of the binomial parameter p and an observation m of a binomial distribution {\ mathcal B} (n+m, p) , where m is thus the number of positive exits observed and N the number of failures observed, the probability that p is between has and B knowing m is worth:

\ frac {\ displaystyle {\ int_a^b C_ {n+m} ^m \, p^m (1-p) ^n \, dp}} {\ displaystyle {\ int_0^1 C_ {n+m} ^m \, p^m (1-p) ^n \, dp}}

Its preliminary results, in particular proposals 3,4 and 5 imply the result which one calls theorem of Bayes (stated low) but it does not seem that Bayes concentrated or insisted on this result.

What is “bayésien” (with the current direction of the word) in Proposal 9, it is that Bayes presented that like a probability on the parameter p . In other words, one can determine, not only probabilities starting from observations of the exits of an experiment, but also the parameters relative to these probabilities. It is the same type of analytical calculation which makes it possible to determine by inference both. On the other hand, if one believes a interpretation frequentist of it, it cannot exist of probability of distribution of the parameter p and consequently, one can reason on p only with one reasoning of inference not-probabilist.

The theorem of Bayes in Statistics

The theorem of Bayes is used in the statistical inference to update or bring up to date the estimates of a probability or an unspecified parameter, starting from the observations and of the laws of probability of these observations. There are a discrete version and a continuous version of the theorem.

  • the school bayésienne uses the probabilities as average of translating a degree of knowledge numerically (the mathematical theory of the probabilities indeed by no means obliges to associate those with frequencies, which represent only one particular application of it resulting from the law of the great numbers). Accordingly, the theorem of Bayes can apply to any proposal, whatever the nature of the variables and independently of any ontological consideration.

  • the school frequentist uses the properties of long run of the law of the observations and does not consider a law on the parameters, unknown but fixed.

In theory of probability, the theorem of Bayes states conditional probabilities: being given two events has and B , the theorem of Bayes allows to determine the probability of has knowing B , if the probabilities are known:

  • of has ;
  • of B ;
  • of B knowing has .

This elementary theorem (originally named “of probability of the causes”) has considerable applications.

To lead to the theorem of Bayes, one starts from one of the definitions of the Conditional probability:

P (has \ green B) P (B) = P (has \ course B) = P (B \ green A) P (A)

by noting P (\ course B) the probability has that has and B has both place. While dividing on both sides by P ( B ), one obtains:

P (has|B) = \ frac {P (B | A) P (A)} {P (B)}

that is to say the theorem of Bayes.

Each term of the theorem of Bayes has a common name.

The term P ( has ) is the prior probability of has . It is “former” to the direction which it precedes any information on B . P ( has ) is also called the marginal probability of has . The term P ( has | B ) is called the posterior probability of has knowing B (or of has under condition B ). It is “posterior”, with the direction which it depends directly on B . The term P ( B | has ), for a B known, is called the function of probability of has . In the same way, the term P ( B ) is called the marginal probability or a priori of B .

Other writings of the theorem of Bayes

One improves sometimes the theorem of Bayes by noticing that

P (B) = P (has \ course B) + P (A^C \ course B) = P (B|A) P (A) + P (B|A^C) P (A^C)

in order to rewrite the theorem as follows:

P (has|B) = \ frac {P (B | A) P (A)} {P (B|With) P (A) + P (B|A^C) P (A^C)}

where has C is the complementary of has . More generally, if { has I } is a partition of the whole of the possible ones,

P (A_i|B) = \ frac {P (B | A_i) P (A_i)}{\ sum_j P (B|A_j) P (A_j)}\,

for all has I of the partition.

See also the Théorème of the total probabilities.

The step of I.J. Good

I.J. Good takes up an idea of Alan Turing: the probabilities become easier to handle so instead of reasoning on a probability p, one works on a in the following way built quantity:

Ev (p) = ln (p (1-p)) or Ev (p) = log (p (1-p))

that it names off weight obviousness , term to which one can give various translations: “weight of testimony”, “value of plausibility”, etc What is interesting to retain some is this:

  • a obviousness can vary from less the infinite one with more the infinite one.
  • One often works by convenience in decibels (dB), 10 log10 (p (1-p))
  • the observation of a phenomenon results in a variation of obviousness which constitutes a translation obviousness , the value of this translation not depending on the probabilities a priori of the user. An observation thus brings objective information which is the same one for all the observers , which the law of Bayes did not highlight….

In calculations of reliability, where it is necessary to handle very large probabilities (1-ε) and very small (ε), to work in term of obviousnesses allows a visualization much clearer classes of safety: an obviousness of -70dB corresponds to a probability of 10-7, etc One can also work by keeping in all circumstances the same number of decimals and without handling exhibitors, which improves the legibility of calculations.

Theorem of Bayes for densities of probability

There exists also a version of the theorem for the continuous distributions, which results simply from the density united of the observations and the parameters, product of probability by the density a priori on the parameters, by application of the definition of the laws and the conditional densities.

The form continues theorem of Bayes can be as interpreted as indicating as the distribution a posteriori is obtained by multiplying the distribution a priori , by probability, and by carrying out a standardization (owing to the fact that it is about a density of probability). In calculation bayésien, one thus takes the practice to work with signs of proportionality rather than of the equalities to decrease the complexity of the expressions since the missing constants are found by integration (in theory). The techniques of simulation of the type Monte Carlo and MCMC do not use besides these constants of standardization.

The most known example is the following: if one observes K serial numbers of apparatuses, that largest is S, and that they are supposed numbered from 1, which is the best estimate of the number NR of existing apparatuses? It is shown that the best simple estimator is NR = S X (K-1)/(K-2), and especially that the precision of this estimate grows very quickly, even with small values of K.

Another possible example: let us suppose that an unknown proportion p voters votes “Yes” with: p \ in . One draws from the population a sample of N voters among which a number X voted “Yes”. The function of probability is thus worth:

L ( p ) = p X (1 − p ) N X .

By multiplying that by the function of density of probability a priori of p and while standardizing, one calculates the probability distribution a posteriori of p , which injects the information of the new data of the survey. Thus, if the probability a-priori of p is uniform on the interval, then the probability a-posteriori will have the form of a Fonction Beta.

F ( p | X ) = p X (1 − p ) N X

the constant being different from that of the function of probability.

The Fonction Beta is found with a great regularity in these questions of estimate. The calculus of the variation of Entropy between the old one and the new distribution make it possible to quantify exactly, out of bits, information obtained.

See also: Contenu=Voir also the article [[Experimental design]] and the problem known as of [[gangster penguin (mathematics)]], [[gangster penguin]].

Inference bayésienne

The rules of the mathematical theory of the Probabilité S apply to probabilities as such, not only with their application as a relative frequencies of random events. One can decide to apply them to degrees of belief in certain proposals. These degrees of belief are refined taking into consideration experiment by applying the theorem of Bayes.

The Théorème of Cox-Jaynes justifies today very well this approach, which had a long time only intuitive and empirical bases.

Examples

From which ballot box does come the ball?

As example, let us imagine two ballot boxes filled with balls. The first contains ten (10) black balls and thirty (30) white; the second has twenty of them (20) each. One draws without particular preference in one from the ballot boxes randomly and in this ballot box, one draws a ball randomly. The ball is white. Which is the probability that one draws in the first ballot box this ball knowing that it is white?

Intuitively, it is included/understood well that it is more probable than the opposite, therefore that the probability should be more than 50  %. The exact answer comes from the theorem of Bayes.

Either H 1 the assumption “One draws in the first ballot box. ” and H 2 the assumption “One draws in the second ballot box. ”. Like one draws without particular preference, P ( H 1) = P ( H 2); moreover, as one drew certainly in one from the two ballot boxes, the sum of the two probabilities is worth 1: each 50  is worth; %.

Let us note “Of information given “One draws a white ball. ” As one randomly draws a ball in one from the ballot boxes, the probability of D knowing/under the assumption H 1 is worth:

P (D | H_1) = \ frac {30} {40} = 75 \, \ %

The same if one considers H 2,

P (D | H_2) = \ frac {20} {40} = 50 \, \ %

The formula of Bayes in the discrete case thus gives us.

\ begin {matrix} P (H_1 | D) &=& \ frac {P (H_1) \ cdot P (D | H_1)}{P (H_1) \ cdot P (D | H_1) + P (H_2) \ cdot P (D | H_2)} \ \ \ \ \ & =& \ frac {50 \ % \ cdot 75 \ %} {50 \ % \ cdot 75 \ % + 50 \ % \ cdot 50 \ %} \ \ \ \ \ & =& 60 \ % \end{matrix}

Before the color of the ball is looked at, the probability of having chosen the first ballot box is a probability a-priori, P ( H 1) that is to say 50  %. After having looked at the ball, one revises our judgment and one considers P ( H 1| D ), is 60  %.

Contradictory forecasts

  • a weather station has envisages good weather for tomorrow.
  • Another, B, envisage contrary to the rain.
  • One knows that in the past has was mistaken 25% in time in its forecasts, and B 30% of time.
  • One also knows that on average 40% of the days are of good weather and 60% of rain.

Who to believe, and with which probability?

This approach bayésienne is used by the poisons centres to detect as quickly as possible and with the maximum of precision the type of poisoning from which probably a patient suffers.

Social aspects, legal and political

A problem regularly raised by the approach bayésienne is the following: if a probability of behavior (delinquency, for example) is strongly dependant on certain social, cultural or hereditary factors, then:
  • on a side, one can wonder whether that does not suppose partial a reduction of responsibility , morals in the absence of legal of the delinquents. Or, which returns to same, with an increase in responsibility for the company, which did not know or could not neutralize these factors.
  • on another side, one can wish to use this information to direct as well as possible a policy of Prévention, and it should be seen whether the public interest or morals will put up with this discrimination de facto citizens (it was positive).
These problems are tackled in the film Minority Report.

Medical “False-positives”

Let us start by posing a problem (too much) simplified , which will have of other only deserves to introduce the real problem.

The “simplified” problem

  • the medical test of a rare disease is regarded as reliable to 99%.
  • This disease touches a person on 100.000 in the population.
  • You carry out the test. It appears positive.

No panic. In the form given , this problem simplified (one could say caricatured ) indicates to us that on a million people, ten thousand (1%) will be regarded as attacks whereas 10 only (one on 100.000) are reached disease. This reliable test to 99% gives, if it is positive, fortunately 999 false alarms out of 1000 .

That having had just for goal to fix the ideas, let us pass to the real case.

The real problem

The false-positives are a difficulty inherent in all the tests: no test is perfect. Sometimes, the result will be positive wrongly, which one names
sometimes first order risk or risk alpha .

For example, when one tests a person to know if it is infected by a disease, there is a generally negligible risk that the result is positive whereas the patient did not contract the disease. The problem then is not to measure this risk in the absolute (before carrying out the test), it still should be determined the probability that a positive test is it wrongly. We will show how, in the case of a very rare disease, the same highly reliable test in addition can lead to a clear majority of positive illegitimate.

Let us imagine an extremely reliable test:

  • if a patient contracted the disease, the test points out it, i.e. is positive , almost systematically, 99  % of the times, is with a probability 0,99;
  • if a patient is healthy, the test is correct, i.e. negative in 95  % of the cases, is with a probability 0,95;

Let us imagine that the disease touches only one person out of thousand, that is to say with a probability 0,001. That can appear little but in the case of a fatal disease, it is considerable. We have all the necessary informations to determine the probability that a test is positive wrongly.

Let us indicate by has the event “the patient contracted the disease” and by B the event “the test is positive”. The second form of the theorem of Bayes in the discrete case gives then:

\begin{matrix} P (HAS|B) &= & \ frac {0,99 \ times 0,001} {0,99 \ times 0,001 + 0,05 \ times 0,999} \, \ \ \ \ & \ approx &0,019 \, \end{matrix}

Knowing that the test is positive, the probability that the patient is healthy is thus worth approximately: (1 − 0,019) = 0,981. Because of very the small number of patients,

  • practically all the patients present a positive test, but
  • practically also, all the positive tests indicate healthy carriers .

If the treatment is very heavy, expensive or dangerous for a healthy patient, it can be then inappropriate to treat all the positive patients without risk or complementary test (who will undoubtedly be more precise and more expensive, the first test having been used only to draw aside the most obvious cases).

One all the same succeeded in with the first test isolating a population twenty times less which contains practically all the patients. While carrying out other tests, one can hope to improve the reliability of the test. The theorem of Bayes shows us that in the case of a weak probability of patients, the risk to be declared positive wrongly has a very strong impact on reliability.

References

Various versions of the original test, in English

  • T. Bayes (1763), “Year Essay towards solving has Problem in the Doctrine off Chances”, Philosophical Transactions off the Royal Society off London , 53.

  • T. Bayes (1763/1958) “Studies in the History off Probability and Statistics: IX. Thomas Bayes' S Essay Towards Solving has Problem in the Doctrine off Chances”, Biometrika 45:296 - 315 (Bayes' S essay in modernized notation)

  • T. Bayes “Year essay towards solving has Problem in the Doctrine off Chances” (Bayes' S essay in the original notation)

English comments

  • G.A. Barnard. (1958) “Studies in the History off Probability and Statistics: IX. Thomas Bayes' S Essay Towards Solving has Problem in the Doctrine off Chances”, Biometrika 45:293 - 295 (biographical remarks)

  • D. Covarrubias “Year Essay Towards Solving has Problem in the Doctrine off Chances” (year outline and exposure off Bayes' S essay)

  • S.M. Stigler (1982) “Thomas Bayes' Bayesian Inference,” Newspaper off the Royal Statistical Society , Series has, 145:250 - 258 (Stigler wire-drawers for has revised interpretation off the essay -- recommended)

Other references

  • P.S. Laplace (1774) “Memory on the Probability of the Causes by the Events,” Erudite Strange 6:621 - 656, also Works 8:27 - 65.

  • P.S. Laplace (1774/1986), “Memoir one the Probability off the Causes off Vents”, Statistical Science , 1 (3): 364--378.

  • S.M. Stigler (1986), “Laplace' S 1774 to memoir one reverses probability,” Statistical Science , 1 (3): 359--378.

  • Jeff Miller Earliest Known Use off Somme off the Words off Mathematics (B) ( very informative -- recommended )

  • A. Papoulis (1984), Probability, Random Variables, and Stochastic Processes , second edition. New York: McGraw-Hill.

  • S.E. Fienberg (2005) When did Bayesian Inference become " Bayesian"? HTTP: /ba.stat.cmu.edu/journal/2006/vol01/issue01/fienberg.pdf Bayesian Analysis, pp. 1-40

Related articles

Random links:Antéros | Le Corbusier inheritance of Firminy-Green | Roman triptych. Meditations | Nik Witkowski | Mow Abejas

© 2007-2008 speedlook.com; article text available under the terms of GFDL, from fr.wikipedia.org