Quantile

The quantiles are points essential taken with vertical regular intervals of a Fonction of cumulative distribution of a Random variable . To divide data ordered into Q under-plays of data of primarily equal size is the motivation of the Q - quantiles; the quantiles are the values of data marking the limits between two consecutive under-plays. p HT Q - quantile is the been worth X such that the probability that has random variable will Be less than X is At most p / Q and the probability that has random variable will Be less than gold equal to X is At least p / Q . There are Q   −  1 Q - quantiles, with p year integer satisfying 0 < p < Q . -->

Certain quantiles have special names:

  • the 100-quantiles are called Centile S or percentiles according to a frequent Anglicisme;
  • the 10-quantiles are called Décile S;
  • the 5-quantiles are called Quintile S;
  • the 4-quantiles are called Quartile S.

Certain computer programs define the minimum quantile and the maximum quantile by respectively the quantile of order 0 and the quantile of order 100. However, such a terminology goes beyond the traditional definitions of the statistics. For an infinite population, the p - ième Q - quantile is the value of the data where the function of cumulative distribution is worth p / Q . For a number finished NR of pullings, it is necessary to calculate Np / Q --if it is not an entirety, then it is necessary to round with the higher entirety to obtain an approximate value (by supposing that pullings are ordered by ascending value); if it is an entirety then any value since the value of this pulling up to the value of next pulling can be selected for the quantile, and conventionally (but it is completely arbitrary) the average of these two values is taken.

More formally: the p - iéme Q - quantile of the distribution of the random variable X can be defined like the value (S) X such as:

P (X \ Leq X) \ geq \ frac {p} {Q} \ \ mathrm {and} \ P (X \ geq X) \ geq \ frac {q-p} {Q}.

So instead of taking p and Q like entireties, the p - quantile is based on a Real number p with 0< p <1 then this becomes: the p - quantile of the distribution of the random value X can be defined like the value (S) X such as:

P (X \ Leq X) \ geq p \ \ mathrm {and} \ P (X \ geq X) \ geq 1-p.

The standardized results of tests are commonly badly interpreted: We often say " in the 80éme centile". In fact, we say that as if the 80éme centile were an interval in which we must place ourselves, which is not the case; One can place on any centile or between two centiles, but not in a centile.

If a distribution is symmetrical, then the median is the average, but it is not generally the case.

The quantiles are useful measurements because they are less sensitive to the lengthened distributions and the aberrant values. For example, with a random value which follows a exponential Distribution, any particular sample of this random variable will have a chance of 63% roughly to be lower than the average. This is with the presence of a long tail of the exponential distribution in the positive values, which is absent in the negative values.

Empirically, if the data which you analyze are not distributed as the distribution until you wait, where if another source of aberrant values influence the value of the average, then the quantiles are statistics much more useful than the average or other types of statistical moments.

The robust Régression is strongly related to this subject. It uses the sum of the absolute values of the actual values, instead of the errors squared. Connection is on the fact that the average is among the estimators related to a distribution the only one which minimizes the hope of the square of the errors, while the median minimizes the hope of the absolute error. The robust regression shares the capacity to be relatively insensitive with the broad deviations due to certain meaningless statements.

The quantiles of a random variable are generally preserved at the time of ascending transformations, which means that for example if m is the median of a random variable X then 2 m is the median of 2 X , unless an arbitrary choice was made starting from a beach of values, to specify a particular quantile. The quantiles can also be used whenever only data Ordinales are available.

Calculation of the quantiles

There exist various methods to estimate the quantiles:

Either NR the number of not-missing values of the sampled population, and or x_1, x_2, \ ldots, x_N the ordered values of the same population, such as x_1 is the smallest value, etc For the K - iéme Q - quantile, we have p = k/q.

; Empirical function of distribution: \ begin {boxes} x_j, & g=0 \ \ x_ {j+1}, & g>0 \ end {boxes} j is the entiére part of N \ cdot p and g is the fractional share. ; Empirical function of distribution with setting with the average: \ begin {boxes} \ frac {1} {2} (x_j+x_ {j+1}), & g=0 \ \ x_ {j+1}, & g>0 \ end {boxes} j is the entiére part of N \ cdot p and g is the fractional share. ; Weighted average: x_ {j+1} +g \ cdot (x_ {j+2} - x_ {j+1}) j is the entiére part of (N-1) \ cdot p and g is the fractional part. this method is used, for example, in the function PERCENTILE of Microsoft Excel. ; Sample of number nearest to (N-1)·p+1 : \ begin {boxes} x_j, & g<.5 \ \ x_ {j+1}, & G \ Ge .5 \ end {boxes} j is the entiére part of (N-1) \ cdot p+1 and g is the fractional part.

See too

Random links:325 | Constitutional bishop | List new stations on lines at high speed in France | Tiranges | Henri Ermice | County of Xiji | Cette_vie_sportive