Standard deviation

In Mathematical, the standard deviation is a positive real quantity, possibly infinite, used in the field of the Probabilité S to characterize the distribution of a Random variable around its Moyenne. In particular, the average and the standard deviation entirely characterize the Gaussian laws with a real parameter, so that they are used to parameterize them. More generally, the standard deviation, through its square, called variance, makes it possible to characterize Gaussian laws in higher dimension. These considerations are not of no importance, in particular in the application of the central limit theorem.

In Statistical, the standard deviation or standard deviation is defined on the contrary for a finished whole of numerical data interpreted like the realization of a random variable. It is then used to set up Tests, in other words, it makes it possible to decide if a probability is plausible taking into account the values laid out with a certain margin of error. The standard deviation is also used in the problems of linear Regression.

The standard deviations know many applications, as well in the surveys, as in physics, or biology. They make it possible in practice to give an account of the numerical results of a repeated experiment. In finance the standard deviation or " standard deviation" is a measurement of the volatility of a credit.

Utility

The standard deviation is used to measure the dispersion of a whole of data, for example the distribution of the notes of a class. In this case, plus the standard deviation is weak, plus the class is homogeneous. Contrary, one can wish to have the standard deviation broadest possible to prevent that the notes are tightened too much (traditional example of the teacher which notes from 8 to 13).

In the case of a notation from 0 to 20, the minimum standard deviation is 0 (if all the student pupils/have the same note), and until approximately 10 if half has 0/20 and other half 20/20.

Definition

In the modern formulation of the probabilities, following work of Henri Lebesgue, a random variable X is an application to actual values or vectorial, dependant on a parameter X according to a law of probability P . If the comprehension of the formalism calls upon the Théorie of measurement, its use remains simple. The application X does not play a fundamental role; only its law , the image of P by X , noted P X , imports. It is about a measurement on R or R N . Two quantities are associated for him:

  • Its Average, noted E, also called hope:
  • Its standard deviation , generally noted \ sigma_X, defined like the square root of the hope of ( X - E) 2:
\ sigma_X^2=E) ^2] =E-E^2.
Here, rise squared for the member in right-hand side implicitly indicates the euclidian norm squared if X is with vectorial values.

This identity specializes in a great number of particular cases. Inter alia:

Discrete probability

If the variable X takes a number finished of actual values X 1,…, X N , with respective probabilities p 1,…, p N (under the condition \ sum_ {i=1} ^n p_i=1), the standard deviation is given by:
\ sigma = \ sqrt {\ sum_ {i=1} ^n p_i. (x_i- \ overline {X}) ^2} = \ sqrt {\ left (\ sum_ {i=1} ^n p_i.x_i^2 \ right) - \ overline {X} ^2} , where: \ overline {X} = \ sum_ {i=1} ^n p_i.x_i.

In particular, if the law of X is uniform on a finished whole of values, one a:

\ sigma_X= \ sqrt {\ frac {1} {N} \ sum_ {i=1} ^n (x_i- \ overline {X}) ^2} = \ sqrt {\ frac {1} {N} \ left (\ sum_ {i=1} ^n x_i^2 \ right) - \ overline {X} ^2} , where: \ overline {X} = \ frac {1} {N} \ sum_ {i=1} ^n x_i.

These formulas spread immediately in dimension higher by replacing rise squared by the euclidian norm squared.

Probability uniformly continues

The law P X is known as uniformly continuous when the probability that X belongs to the segment B is:

P_x ((has, b)) =P (X \ in (has, b)) = \ int_a^b F (X) dx
where F is a function locally integrable for the measurement of Lebesgue, for example but not necessarily a continuous function. This function F is called the Densité of the law P X . It is overall integrable and of integrable square.

The standard deviation of X is defined by:

\ sigma_X= \ sqrt {\ int_ {R} F (X) ^2dx- {\ left (\ int_ {R} F (X) dx \ right)}^2} .

Examples of standard deviation

The following table gives the standard deviations for the laws usually met:

has and B . The average \ overline {X} , which one calls also m , is of course the half the sum

m = ( has + B ) /2.
The variance σ ² is written simply
σ ² = - '' m '') ² + ('' B '' - '' m '') ² /2
that one can seek to express in simpler manner and more esthetics. Simple in the direction where will not appear that has and B , esthetics insofar as do not appear that symmetrical expressions in has and B , i.e. ( has + B ) and ab , nap and produced, that one will note S and P .

The average is

m = S /2.
The square of the average is
m ² = S ² /4.
One has in addition:
( has - m ) = ( has - B ) /2
( B - m ) = ( B - has ) /2,
as these quantities are to be raised squared, it is not awkward to work on ( has - B ) or on ( B - has ).

Aesthetically let us transform this square by a play on the remarkable Identités traditional:

( has - B ) ² = ( has + B ) ² - 4 ab = S ² - 4 P
from where the first member of the equality
σ ² = ('' has '' - '' m '') ² + ('' B '' - '' m '') ² /2 = 1/2· - 4 '' P '') + (1/2) ²·('' S '' ² - 4 '' P '') = ( S ² - 4 P ) /4.
The second member of the equality is
\ frac {1} {N} (\ sum_ {i=1} ^nx_i^2) - \ overline {X} ^2
who is worth
here ( has ² + B ²) /2- m ².
Since
has ² + B ² = ( has + B ) ² - 2 ab = S ² - 2 P
we can the récrire as follows:
\ frac {1} {N} \ left (\ sum_ {i=1} ^n x_i^2 \ right) - \ overline {X} ^2 = \ frac {1} {2} (a^2 + b^2) - m^2
what is still written
( S ² - 2 P ) /2 - S ² /4 = S ² /2 - S ² /4 - 2 P /2 = S ² /4 - P
who is quite equal to ( S ² - 4 P ) /4, CQFD. --------------------->

In theory of the surveys

When it is a question of estimating dispersion around the average of a statistical nature in a population of big size starting from a sample of size N, one uses for the standard deviation the following value

s= \ sqrt {\ frac {1} {n-1} \ sum_ {i=1} ^n (x_i- \ overline {X}) ^2} .
One can notice that
s = \ sigma \ sqrt {\ frac {N} {n-1}}

Why N - 1 ?

The question which one generally does put is “Why N - 1 ? ”. The reason for which one divides by N - 1 instead of N is an good example of the permanent interaction between the statistics and the probabilities.

the survey of N individuals corresponds to a series of N random variable xi independent of hope E ( X ) and of variance V ( X ).

the average \ overline {X} of the sample is a random variable of hope E ( X ) and of variance \ frac {1} {N} \ cdot V (X) (the average of N random variable fluctuates less than only one random variable).
the variance v of the sample is a random variable which one wants to calculate the hope.
v= \ left (\ frac {1} {N} \ sum x_i^2 \ right) - \ overline {X} ^2.
x_i^2 is a random variable of hope E (x_i^2) = E (x_i) ^2 + V (x_i) thus equalizes with E (X) ^2 +V (X) .
\ frac {1} {N} \ sum x_i^2 is a random variable of hope E (X) ^2+V (X) .
\ overline {X} ^2 is a random variable of hope E (\ overline {X}) ^2+V (\ overline {X}) =E (X) ^2+ \ frac {1} {N} V (X) .
Thus E (v) = E (X) ^2+V (X) - E (X) ^2- \ frac {1} {N} V (X) = \ frac {n-1} {N} V (X) .
the variance v of the sample thus fluctuates around \ frac {n-1} {N} V (X) and not around V ( X ) as one could have expected it.
to obtain an estimate of V ( X ), it is thus necessary to take \ frac {N} {n-1} v. One could say that v is a skewed estimator.
And to obtain an estimate of the standard deviation \ sigma (X) , it is necessary to take \ sigma \ sqrt {\ frac {N} {n-1}} .

Qualitative aspect

More commonly called standard deviation , the standard deviation characterizes the width of the distribution. It is expressed mathematically as being the square root of the variance, this one measuring the distribution of the values around the center of the curve.

Standard deviation \ sigma = square Root of the variance

  • the standard deviation is the measurement of dispersion, or spreading out, most usually used in statistics when one employs the Moyenne to calculate a central tendency. It thus measures dispersion around the average. Because of its close links with the average, the standard deviation can be largely influenced if the latter gives a bad measurement of tendency centrale.

  • Contrary to the wide and the quartiles, the variance makes it possible to combine all the values inside a whole of data in order to obtain the measurement of dispersion. The variance (symbolized by S ²) and the standard deviation (the square root of the variance, symbolized by S) are measurements of dispersion most usually utilisées.

The variance is defined as being the arithmetic mean of the squares of the differences between the actual values and the average. It is a measurement of the degree of dispersion of a whole of data. One calculates it in the form of the variation to the average square of each number compared to the average of a unit of données.

Pattern of the settlement

When the studied variable is Gaussian (distribution according to a bell-shaped curve), the standard deviation makes it possible to determine the pattern of the settlement around the median value.

For example: So by convention, the standard deviation compared to a sample is equivalent to 15 points of IQ of difference, that means that approximately the 2/3 population of age group have an IQ ranging between 85 and 115. See also on this subject the interval confidence of a Gaussian normal distribution.

Interpretation of a high standard deviation

Generally, more the values are widely distributed, more the standard deviation is high. Imagine, for example, that we must separate two units different of results from examinations from 30 pupils; the notes of the first examination vary from 31% to 98% and those of the second, from 82% to 93%. Taking into account these extents, the standard deviation would be larger for the results of the first examination.

However, it is not always easy to evaluate the importance which the standard deviation must have so that the data are largely dispersées.
The importance of the standard deviation also depends on the importance of the median value of the whole of the data. When you measure something in million, the fact of having measurements which approach the median value does not have the same significance as if you measure the weight of two people.
For example, so after having measured the annual receipts of two large companies, you note a variation of 100.000 euros, the difference is regarded as being not very significant, whereas if you measure the weight of two people, whose variation is of 30 kilograms, the difference is regarded as being very significative.
For this reason it is sometimes useful to work, in certain cases, on the relative standard deviation ( standard deviation quotienté by the average).

---- One names variance the square of the standard deviation : V (X) = \ sigma^2

See too

Simple: Standard deviation

Random links:Eïcosanoïde | Canton of Saint-Amarin | Lista de presidentes de los British Virgin Islands | Economic and commercial preparatory classes | Franklin (Nebraska) | Sergei Kukushin | Uma_Thurman