The theorem of Bayes is a result basic in Theory of probability, resulting from work of the reverend Thomas Bayes and found then independently by Laplace. In its single article, Bayes sought to determine what one would currently call the distribution a posteriori of the probability of a Binomial distribution. Its work was published and presented on a purely posthumous basis (1763) by his/her friend Richard Price in a test to solve a problem in the theory of the risks ( An Essay towards solving has Problem in the Doctrine off Chances ). The results of Bayes were taken again and extended by the French mathematician Laplace in a test of 1774, which was apparently not with the fact of the work of Bayes.
The principal result (Proposal 9 of the test) obtained by Bayes is the following: by considering a uniform distribution of the binomial parameter p and an observation of a binomial distribution , where m is thus the number of positive exits observed and N the number of failures observed, the probability that p is between has and B knowing is worth:
Its preliminary results, in particular proposals 3,4 and 5 imply the result which one calls theorem of Bayes (stated low) but it does not seem that Bayes concentrated or insisted on this result.
What is “bayésien” (with the current direction of the word) in Proposal 9, it is that Bayes presented that like a probability on the parameter p . In other words, one can determine, not only probabilities starting from observations of the exits of an experiment, but also the parameters relative to these probabilities. It is the same type of analytical calculation which makes it possible to determine by inference both. On the other hand, if one believes a interpretation frequentist of it, it cannot exist of probability of distribution of the parameter p and consequently, one can reason on p only with one reasoning of inference not-probabilist.
The theorem of Bayes is used in the statistical inference to update or bring up to date the estimates of a probability or an unspecified parameter, starting from the observations and of the laws of probability of these observations. There are a discrete version and a continuous version of the theorem.
the school bayésienne uses the probabilities as average of translating a degree of knowledge numerically (the mathematical theory of the probabilities indeed by no means obliges to associate those with frequencies, which represent only one particular application of it resulting from the law of the great numbers). Accordingly, the theorem of Bayes can apply to any proposal, whatever the nature of the variables and independently of any ontological consideration.
the school frequentist uses the properties of long run of the law of the observations and does not consider a law on the parameters, unknown but fixed.
In theory of probability, the theorem of Bayes states conditional probabilities: being given two events has and B , the theorem of Bayes allows to determine the probability of has knowing B , if the probabilities are known:
This elementary theorem (originally named “of probability of the causes”) has considerable applications.
To lead to the theorem of Bayes, one starts from one of the definitions of the Conditional probability:
by noting the probability has that has and B has both place. While dividing on both sides by P ( B ), one obtains:
that is to say the theorem of Bayes.
Each term of the theorem of Bayes has a common name.
The term P ( has ) is the prior probability of has . It is “former” to the direction which it precedes any information on B . P ( has ) is also called the marginal probability of has . The term P ( has | B ) is called the posterior probability of has knowing B (or of has under condition B ). It is “posterior”, with the direction which it depends directly on B . The term P ( B | has ), for a B known, is called the function of probability of has . In the same way, the term P ( B ) is called the marginal probability or a priori of B .
One improves sometimes the theorem of Bayes by noticing that
in order to rewrite the theorem as follows:
where has C is the complementary of has . More generally, if { has I } is a partition of the whole of the possible ones,
for all has I of the partition.
See also the Théorème of the total probabilities.
I.J. Good takes up an idea of Alan Turing: the probabilities become easier to handle so instead of reasoning on a probability p, one works on a in the following way built quantity:
Ev (p) = ln (p (1-p)) or Ev (p) = log (p (1-p))
that it names off weight obviousness , term to which one can give various translations: “weight of testimony”, “value of plausibility”, etc What is interesting to retain some is this:
In calculations of reliability, where it is necessary to handle very large probabilities (1-ε) and very small (ε), to work in term of obviousnesses allows a visualization much clearer classes of safety: an obviousness of -70dB corresponds to a probability of 10-7, etc One can also work by keeping in all circumstances the same number of decimals and without handling exhibitors, which improves the legibility of calculations.
There exists also a version of the theorem for the continuous distributions, which results simply from the density united of the observations and the parameters, product of probability by the density a priori on the parameters, by application of the definition of the laws and the conditional densities.
The form continues theorem of Bayes can be as interpreted as indicating as the distribution a posteriori is obtained by multiplying the distribution a priori , by probability, and by carrying out a standardization (owing to the fact that it is about a density of probability). In calculation bayésien, one thus takes the practice to work with signs of proportionality rather than of the equalities to decrease the complexity of the expressions since the missing constants are found by integration (in theory). The techniques of simulation of the type Monte Carlo and MCMC do not use besides these constants of standardization.
The most known example is the following: if one observes K serial numbers of apparatuses, that largest is S, and that they are supposed numbered from 1, which is the best estimate of the number NR of existing apparatuses? It is shown that the best simple estimator is NR = S X (K-1)/(K-2), and especially that the precision of this estimate grows very quickly, even with small values of K.
Another possible example: let us suppose that an unknown proportion p voters votes “Yes” with: . One draws from the population a sample of N voters among which a number X voted “Yes”. The function of probability is thus worth:
L ( p ) = p X (1 − p ) N − X .
By multiplying that by the function of density of probability a priori of p and while standardizing, one calculates the probability distribution a posteriori of p , which injects the information of the new data of the survey. Thus, if the probability a-priori of p is uniform on the interval, then the probability a-posteriori will have the form of a Fonction Beta.
F ( p | X ) = p X (1 − p ) N − X
the constant being different from that of the function of probability.
The Fonction Beta is found with a great regularity in these questions of estimate. The calculus of the variation of Entropy between the old one and the new distribution make it possible to quantify exactly, out of bits, information obtained.
See also: Contenu=Voir also the article [[Experimental design]] and the problem known as of [[gangster penguin (mathematics)]], [[gangster penguin]].
The rules of the mathematical theory of the Probabilité S apply to probabilities as such, not only with their application as a relative frequencies of random events. One can decide to apply them to degrees of belief in certain proposals. These degrees of belief are refined taking into consideration experiment by applying the theorem of Bayes.
The Théorème of Cox-Jaynes justifies today very well this approach, which had a long time only intuitive and empirical bases.
As example, let us imagine two ballot boxes filled with balls. The first contains ten (10) black balls and thirty (30) white; the second has twenty of them (20) each. One draws without particular preference in one from the ballot boxes randomly and in this ballot box, one draws a ball randomly. The ball is white. Which is the probability that one draws in the first ballot box this ball knowing that it is white?
Intuitively, it is included/understood well that it is more probable than the opposite, therefore that the probability should be more than 50 %. The exact answer comes from the theorem of Bayes.
Either H 1 the assumption “One draws in the first ballot box. ” and H 2 the assumption “One draws in the second ballot box. ”. Like one draws without particular preference, P ( H 1) = P ( H 2); moreover, as one drew certainly in one from the two ballot boxes, the sum of the two probabilities is worth 1: each 50  is worth; %.
Let us note “Of information given “One draws a white ball. ” As one randomly draws a ball in one from the ballot boxes, the probability of D knowing/under the assumption H 1 is worth:
The same if one considers H 2,
The formula of Bayes in the discrete case thus gives us.
Before the color of the ball is looked at, the probability of having chosen the first ballot box is a probability a-priori, P ( H 1) that is to say 50 %. After having looked at the ball, one revises our judgment and one considers P ( H 1| D ), is 60 %.
Who to believe, and with which probability?
This approach bayésienne is used by the poisons centres to detect as quickly as possible and with the maximum of precision the type of poisoning from which probably a patient suffers.
No panic. In the form given , this problem simplified (one could say caricatured ) indicates to us that on a million people, ten thousand (1%) will be regarded as attacks whereas 10 only (one on 100.000) are reached disease. This reliable test to 99% gives, if it is positive, fortunately 999 false alarms out of 1000 .
That having had just for goal to fix the ideas, let us pass to the real case.
For example, when one tests a person to know if it is infected by a disease, there is a generally negligible risk that the result is positive whereas the patient did not contract the disease. The problem then is not to measure this risk in the absolute (before carrying out the test), it still should be determined the probability that a positive test is it wrongly. We will show how, in the case of a very rare disease, the same highly reliable test in addition can lead to a clear majority of positive illegitimate.
Let us imagine an extremely reliable test:
Let us imagine that the disease touches only one person out of thousand, that is to say with a probability 0,001. That can appear little but in the case of a fatal disease, it is considerable. We have all the necessary informations to determine the probability that a test is positive wrongly.
Let us indicate by has the event “the patient contracted the disease” and by B the event “the test is positive”. The second form of the theorem of Bayes in the discrete case gives then:
Knowing that the test is positive, the probability that the patient is healthy is thus worth approximately: (1 − 0,019) = 0,981. Because of very the small number of patients,
If the treatment is very heavy, expensive or dangerous for a healthy patient, it can be then inappropriate to treat all the positive patients without risk or complementary test (who will undoubtedly be more precise and more expensive, the first test having been used only to draw aside the most obvious cases).
One all the same succeeded in with the first test isolating a population twenty times less which contains practically all the patients. While carrying out other tests, one can hope to improve the reliability of the test. The theorem of Bayes shows us that in the case of a weak probability of patients, the risk to be declared positive wrongly has a very strong impact on reliability.
T. Bayes (1763), “Year Essay towards solving has Problem in the Doctrine off Chances”, Philosophical Transactions off the Royal Society off London , 53.
T. Bayes (1763/1958) “Studies in the History off Probability and Statistics: IX. Thomas Bayes' S Essay Towards Solving has Problem in the Doctrine off Chances”, Biometrika 45:296 - 315 (Bayes' S essay in modernized notation)
T. Bayes “Year essay towards solving has Problem in the Doctrine off Chances” (Bayes' S essay in the original notation)
G.A. Barnard. (1958) “Studies in the History off Probability and Statistics: IX. Thomas Bayes' S Essay Towards Solving has Problem in the Doctrine off Chances”, Biometrika 45:293 - 295 (biographical remarks)
D. Covarrubias “Year Essay Towards Solving has Problem in the Doctrine off Chances” (year outline and exposure off Bayes' S essay)
S.M. Stigler (1982) “Thomas Bayes' Bayesian Inference,” Newspaper off the Royal Statistical Society , Series has, 145:250 - 258 (Stigler wire-drawers for has revised interpretation off the essay -- recommended)
P.S. Laplace (1774) “Memory on the Probability of the Causes by the Events,” Erudite Strange 6:621 - 656, also Works 8:27 - 65.
P.S. Laplace (1774/1986), “Memoir one the Probability off the Causes off Vents”, Statistical Science , 1 (3): 364--378.
S.M. Stigler (1986), “Laplace' S 1774 to memoir one reverses probability,” Statistical Science , 1 (3): 359--378.
Jeff Miller Earliest Known Use off Somme off the Words off Mathematics (B) ( very informative -- recommended )
A. Papoulis (1984), Probability, Random Variables, and Stochastic Processes , second edition. New York: McGraw-Hill.
S.E. Fienberg (2005) When did Bayesian Inference become " Bayesian"? HTTP: /ba.stat.cmu.edu/journal/2006/vol01/issue01/fienberg.pdf Bayesian Analysis, pp. 1-40
| Random links: | Antéros | Le Corbusier inheritance of Firminy-Green | Roman triptych. Meditations | Nik Witkowski | Mow Abejas |