The analyzes discriminating linear belonged to the techniques of predictive discriminating analysis. It is a question of explaining and of predicting the membership of an individual to a class (group) preset starting from its characteristics measured using predictive variables.
In the example of the article discriminating Analysis , the file Flea Beetles, the objective is to determine the membership of chips with such or such species starting from the width and of the angle of sound Aedeagus.
The variable to be predicted is inevitably catégorielle (discrete), it has 3 methods in our example. The predictive variables all are a priori continuous. It is nevertheless possible to treat the discrete predictive variables with the help of an adequate preparation of the data.
The linear discriminating analysis can be compared with the supervised methods developed in machine Learning and with the logistic Régression developed in statistics.
We have a sample of observations distributed in groups of manpower .
Let us note the variable to be predicted, it takes its values in . We lay out of variable predictive .
We note the centres of gravity of the conditional groups of dots, their Matrice of variance-covariance.
The objective is to produce a rule of assignment which makes it possible to predict, for an observation given, its value associated with Y starting from the values taken by X.
The rule bayesienne consists in producing an estimate of the posterior probability of assignment
is the prior probability of membership of a class. conditionally represents the function of density of X to the class .
The rule of assignment for an individual to classify becomes then
if and only if
All the problems of the discriminating analysis then amount proposing an estimate of the
One distinguishes mainly two approaches to estimate correctly the distribution :
In the case of the multinormality, the distribution of the conditional groups of dots is written
where represents the determinant of the matrix of variance covariance conditionally to
The objective being to determine the maximum of the posterior probability of assignment, we can neglect all that does not depend on K . While passing to the logarithm, we obtain the score discriminating who is proportional to
The rule of assignment thus becomes
if
If the discriminating score completely is developed, we note that it is expressed according to the square and of the product crossed between the predictive variables. One speaks then about analyzes discriminating quadratic . Very much used seeks some because it behaves very well, in terms of performances, compared to the other methods, it is less widespread near the experts. Indeed, the expression of the discriminating score being rather complex, it is difficult to clearly distinguish the direction of causality between the predictive variables and the class of membership. It is in particular badly easy to distinguish the really determining variables in the classification, the interpretation of the results is rather perilous.
One second assumption makes it possible to still simplify calculations, it is the assumption of Homoscédasticité: the matrices of variances covariances are identical of a group to the other. Geometrically, that wants to say that the groups of dots have the same form (and volume) in the space of representation.
The matrix of variance estimated covariance is in this intraclass case the matrix of variance covariance calculated using the following expression
Again, we can evacuate score discriminating all that does not depend any more a K , it becomes
By developing the expression of the discriminating score after introduction of the assumption of homoscedasticity, one notes that it is expressed linearly compared to the predictive variables.
We thus have as many functions classification than methods variable to be predicted, they are linear combinations of the following form:
This presentation is tempting in more than one way. It is possible, by studying the value and the sign of the coefficients, to determine the direction of causalities in the classification. In the same way, it becomes possible, as we will further see it, to evaluate the significant role of the variables in the prediction.
The assumptions of multinormality and homoscedasticity can seem too constraining, restricting the range of the linear discriminating analysis in practice.
The key concept that it is necessary to retain in statistics is the concept of robustness. Even if the starting assumptions are not respected too much, a method can nevertheless apply. It is the case of the linear discriminating analysis. Most important is to regard it as a linear separator. In this case, if the groups of dots are separable linearly in the space of representation, it can function correctly.
Compared to the other linear techniques such as the logistic Regression, the discriminating analysis presents comparable performances. It can be injured nevertheless when the assumption of homoscedasticity is very strongly violated.
In a traditional way in supervised training, to evaluate the performances of a function of classification, we confront its predictions with the true values of the variable to be predicted on a data file. The cross table which results from it calls a matrix of confusion with: in line true classes of membership, in column predicted classes of membership. The error rate or rate of bad classification is quite simply the number of bad classification, when the prediction does not coincide by with the true value, reported to the manpower of the data file.
The error rate has of tempting that it is of easy interpretation, it acts of an estimator of the probability of being mistaken if one applies the function of classification in the population.
Attention however, the error rate measured on the data which were used to build the function of classification, one speaks then about error rate in resubstitution, is skewed. Quite simply because the data are judges and parts in this diagram. The good procedure would be to build the function of classification on a fraction of the data, known as of training; then to evaluate it on another fraction of data, known as of test. The error rate in test thus measured is an indicator worthy of faith.
The practice wants that the distribution of the data in training and test is of 2/3 - 1/3. But actually, there is no true rule. Most important is to reconcile two contradictory requirements: in having sufficiently in test to obtain a stable estimate of the error, while reserving sufficiently in training not to penalize the method of training.
When manpower are weak, and that the division training-test of the data is not possible, there exist methods of D-sampling such as the cross validation or the bootstrap to evaluate the error of classification.
The error rate makes it possible to evaluate and compare methods, whatever their subjacent assumptions. In the case of the linear discriminating analysis, we can exploit the probabilistic model to carry out tests of assumptions.
A first test makes it possible to answer the following question: it is possible to distinguish the groups of dots in the space of representation. Paid within the framework multinormal, that amounts checking if the conditional centres of gravity are confused (null assumption) or if at least from these centres of gravity deviates significantly from the others (alternative assumption).
The statistics of the test are the of Wilks, its expression is the following one
where represents the determinant of the matrix of variance covariance intraclass, the determinant of the matrix of variance total covariance.
The table of the breaking values of the law of Wilks being seldom available in the software, one usually uses the transformations of Bartlett and Rao which respectively follow a law of the KHI-2 and Fisher.
With a different prism, we note that this test can be expressed as a multidimensional generalization of the variance analysis to a factor (ANOVA), one speaks in this case about MANOVA (Multidimensional Analysis off Variance).
As in all the straight-line methods, it is possible to evaluate each predictive variable individually, and to possibly eliminate those which are not significant in discrimination.
The statistics of the test are based on the variation of Lambda de Wilks at the time of the addition of (J+1) - ième variable in the model of prediction. Its formula is the following one
It follows a law of Fisher to degrees of freedom.
A linear discriminating analysis was launched on Flea Beetles described in the article discriminating Analyze. The results are the following.
the centres of gravity of the three groups of dots deviate significantly. It is what the statistics of Wilks in section MANOVA indicate to us. The associated critical probabilities, transformation of Bartlett and Rao, are close to 0. This numerical result confirms the visual impression left by the projection of the groups of dots in the space of representation (see discriminating Analyze).
the variable to be predicted comprising 3 methods, we obtain 3 linear functions of classification. The individual evaluation of the variables in discrimination indicates that they are all two the very significant ones (p-been worth close to 0).
To classify a new observation with the coordinates (Width = 150 and Angle = 15), we apply the functions in the following way.
Idiot :
On the basis of these calculation, we assign to this observation the class " Concinna".
Mr. Bardos, Discriminating Analysis - Application to the financial risk and scoring , Dunod, 2001.
| Random links: | Baraki (slang) | Section of Good-Council | Order of the engineers of Quebec | Velibor Topic | Salad of barabans |