Method of least squares

The method of least squares , independently worked out by Gauss and Legendre, makes it possible to compare given experimental, generally sullied with errors of measurement to a Mathematical model supposed to describe these data.

This model can take various forms. It can be a question of laws of conservation that the measured quantities must respect. The method of least squares then makes it possible to minimize the impact of the experimental errors in “adding information” in the process of measurement.

In the case more running, the ideal model is a family of functions ƒ ( X ; \ theta) of one or more dummy variables X , indexed by one or more unknown parameters \ theta. The method of least squares makes it possible to select among these functions, that which reproduces best the experimental data. One speaks in this case about adjustment by the method of least squares . If the parameters \ theta have a direction Physique the procedure of adjustment also gives a indirect estimate of the value of these parameters.

The method consists of a regulation (initially empirical) which is that the function ƒ ( X ; \ theta) which describes “best” the data is that which minimizes the quadratic sum of the deviations measurements with the predictions of ƒ ( X ; \ theta). If for example, we have NR measurements, ( yi  ) I = 1, NR the “optimal” parameters \ theta within the meaning of the method of least squares is those which minimize the quantity:

S (\ theta) = \ sum_ {i=1} ^N \ left (y_i - F (x_i; \ theta) \ right) ^2 = \ sum_ {i=1} ^N r_i (\ theta)

where the r_i (\ theta) are the residues with the model, i.e the differences between the points of measurement y_i and the model F (X; \ theta) . S (\ theta) can be regarded as a measurement of the distance between the experimental data and the ideal model which predicts these data. The regulation of least squares orders than this distance is minimal.

If, as it is generally the case, one has an estimate of the standard deviation σ I of the noise which affects each measurement yi , one uses it “to weigh” the contribution of measurement to the χ ². A measurement will all the more have weight which its uncertainty will be weak:

\ chi^2 (\ theta) = \ sum_ {i=1} ^N \ left (\ frac {y_i - F (x_i; \ theta)}{\ sigma_i} \ right) ^2 = \ sum_ {i=1} ^N w_i \ left (y_i - F (x_i; \ theta) \ right) ^2

The quantities w_i , opposite of the variances of measurements are called weight measurements. The quantity above is called khi square or ''' khi-deux '''. Its name comes from the statistical law which it describes, if the errors of measurement which sully the yi are distributed according to a normal Loi (what is very current). In this last case, the method of least squares makes it possible of more than quantitatively estimate the adequacy of the model with measurements, for little which one lays out of a reliable estimate of the errors σ I . If the model of error is nonGaussian, it is generally necessary to resort to the method of the Maximum of probability, of which the method of least squares is a particular case.

Its extreme simplicity makes that this method is very usually used nowadays in applied sciences. A current application is the smoothing experimental data by an empirical function (linear function, polynomials or splines). However its most important use is probably the measurement of physical quantities starting from experimental data. In many cases, the quantity which one seeks to measure is not observable and seems only indirectly parameter \ theta of an ideal model F ( X , \ theta ). In this last case of figure, it is possible to show that the method of least squares makes it possible to build a Estimateur \ theta , which checks certain conditions of optimality. In particular, when the model F ( X , \ theta ) is linear according to \ theta , the Théorème of Gauss-Markov guarantees that the method of least squares makes it possible to obtain the not-skewed estimator less dispersed. When the model is a non-linear function of the parameters \ theta the estimator is generally skewed. In addition, in all the cases, the estimators obtained are extremely sensitive to the aberrant points: one translates this fact by saying that they are nonrobust. Several techniques however allow of “robustifier” the method.

History

The day of the New year of 1801, the Italian astronomer Giuseppe Piazzi discovered the asteroid Cérès. It then could follow its trajectory during 40 days. During this year, several scientists tried to predict his trajectory on the basis of observation of Piazzi (at that time, the resolution of the nonlinear equations of Kepler of the Cinématique is a very difficult problem). The majority of the predictions were erroneous; and only sufficiently precise calculation to allow Zach, a German astronomer, to locate Cérès with the end of the year again, was that of Carl Friedrich Gauss, then 24 years old (it had already carried out the development of the fundamental concepts in 1795, when it was then 18 years old). But its method of least squares was published only in 1809, when it appeared in volume 2 of its work on the Celestial mechanics , Theoria Motus Corporum Coelestium in sectionibus conicis solem ambientium . The French mathematician Adrien-Marie Legendre independently developed the same method in 1805.

In 1829, Gauss could give the reasons of the effectiveness of this method; indeed, the method of least squares is precisely optimal with regard to many criteria. This argument is now known under the name of the Théorème of Gauss-Markov.

Formalism

Two simple examples

Average of series of measure independent

The simplest example of adjustment by the method of least squares is probably the calculation of the average m of a set of measures independent (y_i) _ {i=1. NR} sullied with Gaussian errors. The regulation of least squares amounts minimizing quantity:

\ chi^2 (m) = \ sum_ {i=1} ^N \ left (\ frac {y_i - m} {\ sigma_i} \ right) ^2 = \ sum_ {i=1} ^N w_i \ left (y_i - m \ right) ^2
where the w_i = 1/\ sigma_i^2 are the weight measurements y_i .

This quantity is a positive definite quadratic form. Its minimum is calculated by differentiation: {\ rm grad} \ chi^2 (m) = 0. That gives the traditional formula:

m = \ frac {\ sum_ {i=1} ^N w_i y_i} {\ sum_ {i=1} ^N w_i}

In other words, the estimator by least squares of the average m of series of measure sullied with Gaussian errors (known) is their weighed average, i.e their average empirical in which each measurement is balanced by the reverse of the square of its uncertainty. The theorem of Gauss-Markov guarantees that it is about the best not-skewed estimator of m .

The estimated average m fluctuates according to the series of measure y_i carried out. As each measurement is affected of a random error, it is conceived that the average of a first series of NR measurements will differ from the average of one second series of NR measurements, even if those are carried out under identical conditions. It is important to be able to quantify the amplitude of such fluctuations, because that determines the precision of the determination of the average Mr. Each measurement y_i can be regarded as a realization of a random variable Y_i , of average \ overline {y_i} and of standard deviation \ sigma_i . The estimator of the average obtained by the method of least squares, combination linear of random variables, is itself a random variable:

M = \ frac {\ sum_ {i=1} ^N w_i Y_i} {\ sum_ {i=1} ^N w_i} .

The standard deviation of the fluctuations of M is given by (linear combination of independent random variables):

\ sigma (M) = \ left (\ sum_ {i=1} ^N \ frac {1} {\ sigma_i^2} \ right) ^ {- 1/2} = \ left (\ sum_ {i=1} ^N w_i \ right) ^ {- 1/2}

Without much surprise, the precision of the average of a series of NR measurements is thus determined by the number of measurements, and the precision of each one of these measurements. If each measurement is affected same uncertainty \ sigma_i = \ sigma the preceding formula is simplified in:

\ sigma (M) = \ frac {\ sigma} {\ sqrt {NR}}
The precision of the average thus increases like the square root of the number of measurements. For example, to double the precision, four times are needed more data; to multiply it by 10, 100 times are needed more data.

linear Regression

Another example is the adjustment of a linear law of the type y= \ alpha X + \ beta to independent measures, function of a known parameter X . This type of situation meets for example when one wants to gauge a simple measuring device (ammeter, thermometer) whose operation is linear. is then instrumental measurement there (deviation of a needle, many steps of a ADC,…) and X physical size that the apparatus is supposed to measure, generally better known, if a reliable source of calibration is used. The method of least squares then makes it possible to measure the law of calibration of the apparatus, to estimate the adequacy of this law with measurements of calibration ( i.e. in this case, the linearity of the apparatus) and to propagate the errors of calibration to the future measurements taken with the gauged apparatus. It should be noted that as a general, the errors (and correlations) bearing to the measures y_i and measurements x_i must be taken into account. We will treat this case in the following section.

The regulation of least squares is written for this type of model:

\ chi^2 (\ alpha, \ beta) = \ sum_ {i=1} ^N \ left (\ frac {y_i - \ alpha x_i - \ beta} {\ sigma_i} \ right) ^2 = \ sum_ {i=1} ^N w_i \ left (y_i - \ alpha x_i - \ beta \ right) ^2

The minimum of this expression is reached for {\ rm grad} \ chi^2 = 0, which gives:

\begin{pmatrix} \ sum w_i x_i^2 & \ sum w_i x_i \ \ \ sum w_i x_i & \ sum w_i \ \ \end{pmatrix} \times \begin{pmatrix} \alpha_{min} \\ \beta_{min} \\ \ end {pmatrix} = \begin{pmatrix} \sum w_i x_i y_i \\ \sum w_i y_i \\ \end{pmatrix}

Determination of the parameters " optimaux" (within the meaning of least squares) \ alpha and \ beta is thus reduced to the resolution of a system of linear equations. It acts there of a very interesting property, related to the fact that it model itself is linear. One speaks about adjustment or linear regression. In the general case, the determination of the minimum of the \ chi^2 is a more complicated, and generally expensive problem in computing times (cf following sections).

The value of the parameters \ alpha_ {min} and \ beta_ {min} depends on measurements y_i realized. As these measurements are sullied with error, it is conceived well that if one repeats M time the N measurements of calibration, and which one carries out at the conclusion of each series the adjustment describes higher, one will obtain M values numerically different from \ alpha_ {min} and \ beta_ {min} . Parameters of the adjustment can thus be regarded as random variable , whose law is function of the adjusted model and the law of the y_i.

It is shown that the dispersion which affects the values of \ alpha_ {min} and \ beta_ {min} depends on the number of points of measurement, NR , and dispersion which affects measurements (less them measurements are precise, more \ alpha_ {min} and \ beta_ {min} will fluctuate). By elsewhere, \ alpha_ {min} and \ beta_ {min} are generally not independent variables . They are generally correlated, and them correlation depends on the adjusted model (we supposed the independent y_i).

Adjustment of an unspecified linear model

A model F (X; \ theta) is linear, if its dependence in \ theta is linear. Such a model is written:

F (X; \ theta) = \ sum_ {k=1} ^n \ theta_k \ phi_k (X)
where the \ phi_k are unspecified N functions of the variable X . Such a case is very current in practice: the two models studied higher are linear. More generally very polynomial model is linear, with \ phi_k (X) = x^k . Lastly, of very many models used in applied sciences are development on traditional functional bases (splines, bases of Fourier, bases of ondelettes etc)

If we have NR measurements, (x_i, y_i, \ sigma_i) , the \ chi^2 can be written in the form:

\ chi^2 (\ mathbf {\ theta}) = \ sum_ {i=1} ^N \ frac {1} {\ sigma_i^2} \ left (\ sum_ {k=1} ^n \ theta_k \ phi_k (x_i) there _i \ right) ^2

We can exploit the linearity of the model to express the \ chi^2 in a simpler matric form. Indeed, while defining:

\ mathbf {J} = \ begin {pmatrix}
\ phi_1 (x_1) & \ ldots & \ phi_n (x_1) \ \ \ vdots & & \ vdots \ \ \ phi_1 (x_N) & \ ldots & \ phi_n (x_N) \ \ \end{pmatrix} \ \ \ \ \ \ mathbf {\ theta} = \ begin {pmatrix} \ theta_1 \ \ \ vdots \ \ \ theta_n \ \ \end{pmatrix} \ \ \ \ \ mathbf {there} = \ begin {pmatrix} y_1 \ \ \ vdots \ \ y_N \ \ \ end {pmatrix} \ \ \ \ {\ rm and} \ \ \ \ mathbf {W} = \ begin {pmatrix} \ frac {1} {\ sigma_1^2} & \ ldots & 0 \ \ \ vdots & \ ddots & \ vdots \ \ 0 & \ ldots & \ frac {1} {\ sigma_N^2} \ \ \ end {pmatrix} = \ begin {pmatrix} w_1 & \ ldots & 0 \ \ \ vdots & \ ddots & \ vdots \ \ 0 & \ ldots & w_N \ \ \end{pmatrix} it is shown easily that the \ chi^2 is written in the form:
\ chi^2 (\ mathbf {\ theta}) = (\ mathbf {J \ theta} - \ mathbf {there}) ^T \ mathbf {W} (\ mathbf {J \ theta} - \ mathbf {there})
The matrix J is called matrix jacobienne problem. It is a rectangular matrix, of dimension NR X N , with generally NR >> N. It contains the values of the basic functions \ phi_k for each point of measurement. The diagonal matrix W is called matrix of the weights . It is the reverse of the matrix of covariance of the y_i . It is shown that if the y_i are correlated, the relation above is always valid. W is not simply diagonal any more, because covariances between the y_i are not null any more.

By differentiating the relation above compared to each \ theta_k , one obtains:

{\ rm grad} \ \ chi^2 (\ mathbf {\ theta}) = 2 \ \ mathbf {J} ^T \ mathbf {W J \ theta} - 2 \ \ mathbf {J} ^T \ mathbf {Wy}

and the minimum of the \ chi^2 is of which reached for \ theta_ {min} equal to:

\ theta_ {min} = \ left (\ mathbf {J} ^T \ mathbf {WJ} \ right) ^ {- 1} \ \ mathbf {J} ^T \ mathbf {Wy}

One finds the remarkable property of the linear problems, which is that the optimal model perhaps obtained in only one operation, namely the resolution of a system N \ times N .

Adjustment of non-linear models

In many cases, the dependence of the model in \ theta is non-linear. For example, if F (X; \ theta) = F (X; (Has, \ Omega, \ phi)) = \ cos has (\ Omega X + \ phi) , or F (X; \ theta) = F (X; \ tau) = \ exp (- X \ tau) . In this case, the formalism describes with the preceding section cannot be applied directly. The approach generally employed consists then starting from an estimate of the solution, to linearize the \ chi^2 in this point, to solve the linearized problem, then to reiterate. This approach is equivalent to the algorithm of minimization of Gauss-Newton. Other techniques of minimization exist. Some like the Algorithm of Levenberg-Marquardt, are refinements of the algorithm of Gauss-Newton. Others are applicable when the derivative of the \ chi^2 are difficult or expensive to calculate.

One of the difficulties of the problems of non-linear least squares is the frequent existence of several minimas local. A systematic exploration of the space of the parameters can then appear necessary.

Adjustment under constraints

Adjustment of implicit models

Statistical interpretation

The criterion of the χ ²

Optimality of the method of least squares

Robustness

Sensitivity to the aberrant points

Techniques of robustification

Related articles

Random links:City-Marie (Montreal) | Theodor Hendrik van of Velde | Gabrielle Russier | James F. Buchli | Kleine Beerze | Facile_à_utiliser