One calls automatic classification the algorithmic categorization of objects. This one consists with to guess in which class or category one can " ranger" each object or individual, while basing itself on statistical data.
Bases
Our limited means of understanding oblige us, to try to include/understand something with reality, to carry out
classifications objects which we must treat in
categories. Those were considered by the Philosophie:
- at the beginning like preexistent with the observation. It is the step known as Platonic where it is considered that the categories in question preexist to the human understanding, which does nothing but the discover more or less imperfectly. This step remained roughly speaking until the end of the Moyen-âge, where it was curiously indicated under the name of Réalisme.
- estimated thereafter like regroupings ad hoc and aiming only to the convenience of use: there would not exist in oneself poisonous fungus and edible mushroom “” “”, but the effect observed of mushrooms would have led us to classify them functionally into edible and poisonous. This step opposed to the realism of the Middle Ages was named Nominalisme. Bertrand Russell points out in its works that if one were to name them today, one would permute two names.
The automatic Classification aims at creating these categories starting from processes utilizing only the data and not the subjectivity of the experimenter. It would be more exact besides to say: “not utilizing the subjectivity of the experimenter by another thing only the choice of the representations which it uses”: if one classifies objects by considering their greater dimension, one will not in general obtain the same classification as by classifying them by their weights.
Although the first bases of the algorithmic approach of automatic classification are relatively old, it is only with the development of data processing that those became possible to implement on large samples of data. The result of a classification can be either a mathematical Partition or a Hiérarchie (mathematics).
Methods
Among the various methods, one can consider two large types of approaches.
Nonparametric
The not parametric approaches known as (hierarchical Classification, Method of the mobile centers) consider only one assumption: the closer two individuals are, the more they are likely to belong to the same class.
Probabilists
The second big family of methods of automatic classification, known as probabilistic, use an assumption on the distribution of the individuals to be classified. For example, one can consider that the individuals of each class follow a normal Loi. The difficulty which arises then is to determine which are the parameters of the laws (average, variance) and with which class the individuals are likely the most to belong. It is the step of the Algorithme hope-maximization.