The analyzes discriminating linear belonged to the techniques of predictive discriminating analysis. It is a question of explaining and of predicting the membership of an individual to a class (group) preset starting from its characteristics measured using predictive variables.

In the example of the article discriminating Analysis , the file Flea Beetles, the objective is to determine the membership of chips with such or such species starting from the width and of the angle of sound Aedeagus.

The variable to be predicted is inevitably catégorielle (discrete), it has 3 methods in our example. The predictive variables all are a priori continuous. It is nevertheless possible to treat the discrete predictive variables with the help of an adequate preparation of the data.

The linear discriminating analysis can be compared with the supervised methods developed in machine Learning and with the logistic Régression developed in statistics.

Assumptions and Formulas

We have a sample of n \, observations distributed in K \, groups of manpower n_k \, .

Let us note Y \, the variable to be predicted, it takes its values in \ {y_1,…, y_K \} \, . We lay out of J \, variable predictive X = (X_1,…, X_J) \, .

We note \ mu_k \, the centres of gravity of the conditional groups of dots, W_k \, their Matrice of variance-covariance.

The rule bayesienne

The objective is to produce a rule of assignment F: X \ Rightarrow \ {y_1,…, y_K \} which makes it possible to predict, for an observation \ omega given, its value associated with Y starting from the values taken by X.

The rule bayesienne consists in producing an estimate of the posterior probability of assignment

P (Y=y_k/X) = \ frac {P (Y=y_k) \ times P (X/Y=y_k)}{\ sum_ {k=1} ^K P (Y=y_k) \ times P (X/Y=y_k)}

P (Y=y_k) \, is the prior probability of membership of a class. P (X/Y=y_k) \, conditionally represents the function of density of X to the class y_k \, .

The rule of assignment for an individual \ omega to classify becomes then

Y (\ Omega) =y_k^* if and only if y_k^* = arg \; max_ {K} \ P

All the problems of the discriminating analysis then amount proposing an estimate of the P quantity (X/Y = y_k) \,

Parametric discriminating analysis - the assumption of multinormality

One distinguishes mainly two approaches to estimate correctly the distribution P (X/Y=y_k) \, :

  • the not-parametric approach does not carry out any assumption on this distribution but proposes a procedure of local estimate of the probabilities, in the vicinity of the observation \ Omega \, to be classified. The most known procedures are the Noyaux of Parzen and the method of the closer neighbors. The main difficulty is to define in an adequate way the vicinity.

  • the second approach carries out an assumption on the distribution of the conditional groups of dots, one speaks in this case about analyzes discriminating parametric . The assumption most commonly used is without any doubt the assumption of multinormality (see normal Loi).

In the case of the multinormality, the distribution of the conditional groups of dots is written

f_k (X) = \ frac {1} {(2 \ pi) ^ {p/2} \ times |W_k|^ {1/2}} \ times e^ {- \ frac {1} {2} (X \ mu_k) 'W_k^ {- 1} (X \ mu_k)}

where |W_k|\, represents the determinant of the matrix of variance covariance conditionally to y_k \,

The objective being to determine the maximum of the posterior probability of assignment, we can neglect all that does not depend on K . While passing to the logarithm, we obtain the score discriminating who is proportional to P (Y=y_k/X) \,

D = 2 \ times Ln - Ln |W_k| - (X \ mu_k) 'W_k^ {- 1} (X \ mu_k)

The rule of assignment thus becomes

Y (W) =y_k* \, if y_k^* = arg \, max_k \, D = y_k, X (W)

If the discriminating score completely is developed, we note that it is expressed according to the square and of the product crossed between the predictive variables. One speaks then about analyzes discriminating quadratic . Very much used seeks some because it behaves very well, in terms of performances, compared to the other methods, it is less widespread near the experts. Indeed, the expression of the discriminating score being rather complex, it is difficult to clearly distinguish the direction of causality between the predictive variables and the class of membership. It is in particular badly easy to distinguish the really determining variables in the classification, the interpretation of the results is rather perilous.

Linear discriminating analysis - the assumption of homoscedasticity

One second assumption makes it possible to still simplify calculations, it is the assumption of Homoscédasticité: the matrices of variances covariances are identical of a group to the other. Geometrically, that wants to say that the groups of dots have the same form (and volume) in the space of representation.

The matrix of variance estimated covariance is in this intraclass case the matrix of variance covariance calculated using the following expression

W = \ frac {1} {n-K} \ sum_k n_k \ times W_k

Again, we can evacuate score discriminating all that does not depend any more a K , it becomes

D = 2 \ times Ln - (X \ mu_k) 'W^ {- 1} (X \ mu_k)

Linear function of classification

By developing the expression of the discriminating score after introduction of the assumption of homoscedasticity, one notes that it is expressed linearly compared to the predictive variables.

We thus have as many functions classification than methods variable to be predicted, they are linear combinations of the following form:

D (y_1, X) = a_0 + a_1 \ times X_1 +… + a_J \ times X_J

D (y_2, X) = b_0 + b_1 \ times X_1 +… + b_J \ times X_J

… \,

This presentation is tempting in more than one way. It is possible, by studying the value and the sign of the coefficients, to determine the direction of causalities in the classification. In the same way, it becomes possible, as we will further see it, to evaluate the significant role of the variables in the prediction.

Robustness

The assumptions of multinormality and homoscedasticity can seem too constraining, restricting the range of the linear discriminating analysis in practice.

The key concept that it is necessary to retain in statistics is the concept of robustness. Even if the starting assumptions are not respected too much, a method can nevertheless apply. It is the case of the linear discriminating analysis. Most important is to regard it as a linear separator. In this case, if the groups of dots are separable linearly in the space of representation, it can function correctly.

Compared to the other linear techniques such as the logistic Regression, the discriminating analysis presents comparable performances. It can be injured nevertheless when the assumption of homoscedasticity is very strongly violated.

Evaluation

Error rate

In a traditional way in supervised training, to evaluate the performances of a function of classification, we confront its predictions with the true values of the variable to be predicted on a data file. The cross table which results from it calls a matrix of confusion with: in line true classes of membership, in column predicted classes of membership. The error rate or rate of bad classification is quite simply the number of bad classification, when the prediction does not coincide by with the true value, reported to the manpower of the data file.

The error rate has of tempting that it is of easy interpretation, it acts of an estimator of the probability of being mistaken if one applies the function of classification in the population.

Attention however, the error rate measured on the data which were used to build the function of classification, one speaks then about error rate in resubstitution, is skewed. Quite simply because the data are judges and parts in this diagram. The good procedure would be to build the function of classification on a fraction of the data, known as of training; then to evaluate it on another fraction of data, known as of test. The error rate in test thus measured is an indicator worthy of faith.

The practice wants that the distribution of the data in training and test is of 2/3 - 1/3. But actually, there is no true rule. Most important is to reconcile two contradictory requirements: in having sufficiently in test to obtain a stable estimate of the error, while reserving sufficiently in training not to penalize the method of training.

When manpower are weak, and that the division training-test of the data is not possible, there exist methods of D-sampling such as the cross validation or the bootstrap to evaluate the error of classification.

Separability - Comprehensive assessment

The error rate makes it possible to evaluate and compare methods, whatever their subjacent assumptions. In the case of the linear discriminating analysis, we can exploit the probabilistic model to carry out tests of assumptions.

A first test makes it possible to answer the following question: it is possible to distinguish the groups of dots in the space of representation. Paid within the framework multinormal, that amounts checking if the conditional centres of gravity are confused (null assumption) or if at least from these centres of gravity deviates significantly from the others (alternative assumption).

The statistics of the test are the \ lambda \, of Wilks, its expression is the following one

\ lambda = \ frac \,

where |W|\, represents the determinant of the matrix of variance covariance intraclass, |V|\, the determinant of the matrix of variance total covariance.

The table of the breaking values of the law of Wilks being seldom available in the software, one usually uses the transformations of Bartlett and Rao which respectively follow a law of the KHI-2 and Fisher.

With a different prism, we note that this test can be expressed as a multidimensional generalization of the variance analysis to a factor (ANOVA), one speaks in this case about MANOVA (Multidimensional Analysis off Variance).

Individual evaluation of the predictive variables

As in all the straight-line methods, it is possible to evaluate each predictive variable individually, and to possibly eliminate those which are not significant in discrimination.

The statistics of the test are based on the variation of Lambda de Wilks at the time of the addition of (J+1) - ième variable in the model of prediction. Its formula is the following one

F = \ frac {n-K-J} {K-1} \ times (\ frac {\ lambda_J} {\ lambda_ {J+1}} - 1) \,

It follows a law of Fisher to (K-1, n-K-J) \, degrees of freedom.

An example

Reading of the results

A linear discriminating analysis was launched on Flea Beetles described in the article discriminating Analyze. The results are the following.

  • the matrix of confusion indicates that only one error was made, a " Concinna" was classified in " Heikertingeri". The associated error rate is of 1.35 \ % \, . This result is to be relativized, it was established on the data having been used for the training.
  • the centres of gravity of the three groups of dots deviate significantly. It is what the statistics of Wilks in section MANOVA indicate to us. The associated critical probabilities, transformation of Bartlett and Rao, are close to 0. This numerical result confirms the visual impression left by the projection of the groups of dots in the space of representation (see discriminating Analyze).

  • the variable to be predicted comprising 3 methods, we obtain 3 linear functions of classification. The individual evaluation of the variables in discrimination indicates that they are all two the very significant ones (p-been worth close to 0).

Deployment

To classify a new observation with the coordinates (Width = 150 and Angle = 15), we apply the functions in the following way.

  • Idiot : 6.778171 \ times 150 + 17.636347 \ times 15 - 621.005831 = 660.265024 \,

  • Hei: 5.83441 \ times 150 + 17.307979 \ times 15 - 488.153893 = 646.627292 \,
  • Hep: 6.332343 \ times 150 + 13.442467 \ times 15 - 506.831534 = 644.656921 \,

On the basis of these calculation, we assign to this observation the class " Concinna".

References

  • Mr. Bardos, Discriminating Analysis - Application to the financial risk and scoring , Dunod, 2001.

  • G. Celeux, J.P. Nakache, discriminating Analysis on qualitative variables , Polytechnica, 1994.

Software

  • SPAD
  • SPSS and SPSS Clementine
  • SAS Stat
  • XlStat
  • TANAGRA, free software for teaching and research

Random links:Baraki (slang) | Section of Good-Council | Order of the engineers of Quebec | Velibor Topic | Salad of barabans