The theorem of Cox-Jaynes (1946), due in its original version to the physicist Richard Cox, is a coding of the processes of training starting from a certain whole of postulates. This coding is to coincide at the end of these considerations with that - historically of very different origin - of Probabilité.

It thus induces a Interprétation “logical” of the probabilities independent of that of Fréquence. It also provides a rational base to the logical mechanism of induction, and thus of the training by machines. Who more is, the invalid theorem - under the conditions of the postulates - any other form of representation of knowledge as skewed. It is thus about an extremely strong result. (source: Myron Tribes, rational Decisions in dubious the , Masson, 1974)

The results of Cox had not touched that a reduced audience before E.T. Jaynes redécouvre this theorem and does not clear of it a series of implications for the methods bayésiennes, and Irving John Good for the Artificial intelligence.

Problems of validity of the inductive step before Cox

Reserves of Bertrand Russell

In is the chapter “science superstitious? ” of his work Science and religion , Bertrand Russell states the problem - he dares even the word of scandal - posed by the induction:
  • In the name of what to generalize that what was checked in a limited number of cases will also check in the cases which were not tested?

  • In the name of what to suppose, even on what was measured, that what was true yesterday it will be always tomorrow?

Paradox of Hempel

See the Paradox of Hempel, known as of ornithology in room.

“Desideratas” (axioms)

Cox seeks to pose the desirable desideratas for a robot which would reason according to an inductive logical :

The degrees of plausibility are represented by real numbers

  • It is well indeed necessary to be able constantly to say of two plausibilities which is larger than the other , which suggests a quantitative representation, and the digital form seems convenient.

  • a whole representation would pose a problem of discrete noise, no plausibility not being able to slip between two represented by successive entireties.
  • Of rational would certainly be appropriate, but if all realities are not the rational ones, all the rational ones are on the other hand many realities.

Adopted convention, arbitrarily, is that larger plausibilities will be represented by larger numbers .

The rules of inference should not contradict the rules of common inferences

In other words, which appears obvious to us should not be contradicted by the model (with the difference of what occurs with the Paradoxe from Condorcet).

Example:

  • if has is preferable with B,

  • and B preferable with C,
  • all things being equal and in the absence of B, has must be preferred with C.

For the five following sections, all the formulas are here:

  • Cox-Jaynes (pdf)

Regulate coherence

If a conclusion can be obtained by more than one means, then all these means must give the same result well.

This rule eliminates from the field of examination the heuristic multiples since they could contain between them contradictions (as do it for example sometimes the criteria of Wald and the minimax in Game theory).

Regulate honesty

The robot must always take into account the totality of the information which is provided to him. It should not be unaware of a part of them deliberately and base its conclusions on the remainder. In other words, the robot must be completely nonideological , neutral from point of view.

Regulate reproducibility

The robot represents equivalent states of knowledge by equivalent plausibilities. If two problems are identical to a simple labelling of proposals close, the robot must assign same plausibilities in both cases.

That means in particular that proposals will be considered a priori as equivalent plausibility when they are characterized only by their name - what hardly arrives but in very particular cases, like a part or a die having satisfied criteria from non-pipage.

Quantitative rules (laws of composition interns)

The rule of nap

Without returning in the equations, the idea is that when two plausibilities of the same state are composed, made up plausibility is necessarily equal or higher than largest of both.

The rule of product

It is here about the opposite case: when two plausibilities must both be checked so that a state can exist, this state can have the more large plausibility only smallest of the two preceding ones.

External bonds

  • the plausible reasoning
  • quantitative rules

Results

Example

The notation of I.J Good ( weight off obviousness )

Alan Turing had pointed out in its time that the expression of the probabilities was much easier to handle by replacing a probability p varying from 0 to 1 by the expression ln (p (1-p)) varying between less the infinite one and more the infinite one. In particular, in this form, a contribution of information by the rule of Bayes results in the addition of a single algebraic quantity to this expression (that Turing named log-odd ), that whatever the probability a priori starting before the observation.

in decibels (dB)

Irving John Good took up this idea, but to facilitate work with these new quantities:
  • rather used a decimal logarithm than natural, so that the order of magnitude of the associated probability appears with simple reading.
  • adopted a factor 10 in order to avoid the complication handling decimal quantities, where an accuracy of 1% was enough.

It named corresponding measurement, W = 10 log10 (p (1-p)), weight off obviousness because it made it possible “to weigh” the testimony of the facts according to waitings - expressed by “subjective” probabilities former to the observation - independently of these waitings .

out of bits

The obviousnesses are sometimes expressed also in bits, in particular in the tests of validity of scaling Lois. When a law like the Law of Zipf or Mandelbrot is adjusted indeed better with the data than another law not requiring presorting, it indeed should be held account owing to the fact that this Tri represented a contribution of Information about NR log2N and that it is perhaps him only which is responsible for this best adjustment! If the profit obviously brought by the sorting represents less bits than that which the sorting cost, that means that the information brought by the consideration of a scaling law is in fact null.

Consequences of the theorem

Unification of the Boolean algebra and the theory of probability

It is noticed that the Boolean algebra is isomorphous with the theory of probability reduced to only values 0 and 1.
  • And logic = produced probabilities

  • Or logic = sup of two probabilities
  • Not logic = inversion of a probability (p - > 1-p)

This consideration led to the invention in the Années 1970 of the stochastic Calculateurs promoted by the company Alsthom (which was written with a H at the time) and which intended to combine the low costs of the gates with the processing capability of the analog computers. Some were carried out at the time.

Abandonment of the paradigm “frequentist”

August 1st

Rational bases of the training machine

August 1st

Important limitations of the theorem

An apparent paradox

Each discipline has its favorite measurements: if the Thermique deals mainly with Température S, the Thermodynamique will be attached to measurements of Quantité of heat, even of Entropie. Electrostatics is interested more in the tensions than with the intensities, while it is the reverse for the low currents, and that in electrotechnical it is more in terms of power that one will tend to reason. According to his discipline of origin, each experimenter will tend to carry out its estimatiions on the units to which he is accustomed.

Perhaps in the case of an electric assembly, a specialist in electrical engineering will make an estimate of dissipated power (IH ²) while an other of current low will prefer to estimate the intensity itself (I). If the convergence in the long term of the estimates is ensured in both cases, it will not be done in the same way, even with distributions a priori identical , because the expectation of a square is not mathematically related to the square of a hope. It is the principal stone of obstacle of the methods bayésiennes.

The role of the language (formatting)

Independently of the prior probabilities that we allot to the events, our estimates are also partly " formatées" by the language and the " deformation professionnelle" who stick to it. Concretely, that recalls that there does not exist only one, but two sources of arbitrary in the methods bayésiennes: that, of measurement, which sullies the prior probabilities selected and that, of method, which corresponds to our representation of the problem. On the other hand, the arbitrary one is limited to these two elements, and the methods bayésiennes are then completely impersonal.

See too

Internal bonds

External bonds

  • http://www-laplace.imag.fr/Jaynes/prob.html

Random links:Standard Athletic Club | Arzviller | Canton of Chomérac | Daguenière | Reverberation (acoustic)