Continuous elementary statistics

In a statistical survey, when the statistical character can take multiple values (size, surface, wages…) the statistical character is regarded as continuous .

Data processing

When the results of the statistical survey are too numerous so that the sorted list of the values is readable, one prefers to lose information and to arrange the data by intervals called classes. It is necessary whereas, in each class, the distribution of the values is regular. If not, it is necessary to refine and take smaller classes. It is not essential that the classes are of the same amplitude, but it is preferable not to define particle shape categories “more…” which would prevent any secondary treatment then (histogram, average…). One counts the number of times then where the value of the character into the interval this number falls is called ''' effective ''' class [x_i; x_ {i+1} [.

statistical Example of table with classes : Distribution of the annual incomes in thousands of euros in a population of 4370 people.

Manpower here are too large so that one can have a simple idea of the distribution, one then prefers to work in percentages or frequencies and to be thus brought back to a population of 100 for the percentages or 1 for the frequencies.

Average

Since it was estimated that the distribution in each class was regular, one can affirm that it medium of the class is representative of the class. One thus will replace the n_i individuals of the class by n_i individuals whose statistical character would take the value m_i = \ frac {x_i+x_ {i+1}} {2} . Then one calculates the Moyenne as within the framework of the discrete variable :

The average wages among this sample are thus of 98344/4370 = 22,5 is approximately 22500 Euros annual.

The formula used here is: \ overline {X} = \ frac {\ sum_ {i=1} ^N n_im_i} {\ sum_ {i=1} ^N n_i}

The Moyenne is one of the Critères of position.

Charts

Histogram

To represent this statistical survey graphically, the diagram in sticks is inappropriate. Indeed, more the class is large, more manpower is likely to be important. It is thus necessary to represent the manpower of each class by a rectangle whose base is the amplitude of the class and whose surface is proportional to manpower or the frequency. This diagram is called a histogram.

Example: if 1% are represented by 1 square unit.

It any more but does not remain to trace the histogram:

Note: if the amplitudes of the classes are identical, the heights of the rectangles are proportional to manpower or the frequencies.

Polygon of the cumulative frequencies

Since the distribution in each class is supposed to be regular, one can admit that the increase in percentages is a function linear. One then traces the polygon of the increasing cumulated percentages which makes it possible to read the percentage of the class for any X.

Au préalable, it is necessary to fill out the table of the cumulated percentages:

It any more but does not remain to trace the polygon:

One can build in the same way the polygon of the decreasing cumulated percentages.

Variance and standard deviation

The formulas previously established for the discrete variables remain valid on the condition of replacing x_i by m_i medium of the class * V = \ sum_ {i=1} ^Nf_i (m_i- \ overline {X}) ^2 where f_i is the frequency, m_i medium of the class and \ overline {X} the average.
  • \ sigma = \ sqrt {V}

The standard deviation is one of the Critères of dispersion

See too

  • Statistical

    • Statistical descriptive

Random links:Association of Bordeaux of the users of free software | Victor Hasselblad | Christophe Moni | Rigdzin Namkha Gyatso Rinpoché | Philippe Madelin | Charles_meilleur