Network of neurons

A network of neurons (or Artificial Neural Network in English) is a model of calculation whose design is very schematically inspired by the operation of truths Neuron S (human or not). The networks of neurons are generally optimized by methods of training of the statistical type, so that they are placed on the one hand in the family of the applications Statistiques, than they enrich with a unit by Paradigme S making it possible to generate vast functional, flexible and partially structured spaces, and on the other hand in the family of the methods of the Artificial intelligence which they enrich while making it possible to take of the decisions resting more on the Perception that on the Raisonnement formal logic.

History

The networks of neurons are built on a biological Paradigme , that of the formal Neuron (as well as the genetic algorithms is it on the Natural selection). These biological types of Métaphore S became current with the ideas of the Cybernétique.

The neurologists Warren McCulloch and Walter Pitts undertook the first work on the networks of neurons following their article founder: What the frog' S eye such to the frog' S brain . They constituted a simplified model of biological neuron commonly called formal Neuron. They also theoretically showed that networks of formal neurons simple can fulfill functions Logique S, Arithmétique S and Symbolique S complexes.

The function of the networks of formal neurons following the example alive model is to solve problems. Contrary to the traditional methods of Data-processing resolution, one should not build step by step a program according to the comprehension of this one. The most important parameters of this model are the synaptic coefficients . It is them which build the model of resolution according to the information given to the network. It is thus necessary to find a mechanism which makes it possible to calculate them starting from the sizes that one can acquire of the problem. It is the basic principle of the training. In a model of networks of formal neurons, to learn, it is initially to calculate the values of the synaptic coefficients according to the examples available.

Work of McCulloch and Pitts did not give an indication on a method to adapt the synaptic coefficients. This question in the middle of the reflections about the training knew a beginning of answer thanks to work of the American physiologist Donald Hebb on the training in 1949 described in its work The Organization off Behavior . Hebb proposed a simple rule which makes it possible to modify the value of the synaptic coefficients according to the activity of the units that they connect. This rule now known under the name of “Règle of Hebb” is present almost everywhere in the current models, even most sophisticated.

Starting from this article, the idea was sown with the wire of time in the spirits, and it germinated in the spirit of Franck Rosenblatt in 1957 with the model of the Perceptron. It is the first artificial system able to learn by experiment, including when its instructor makes some errors (it in what it differs clearly from a system of formal logical training). Other work also marked the field, like those of Donald Hebb in 1949.

In 1982, John Joseph Hopfield, recognized physicist, gave a new breath to neuronal by publishing an article introducing a new model of network of neurons (completely recurring). This article had success for several reasons, whose principal one was to tint the network analysis of neurons of the rigor specific to the physicists. The neuronal one became again an acceptable subject of study, although the model of Hopfield suffered from the principal limitations of the models of the Années 1960, in particular impossibility of dealing with the non-linear problems.

At the same date, the algorithmic approaches of the artificial intelligence were the object of disillusion, their applications not answering waitings. This disillusion justified a reorientation of research in artificial intelligence towards the networks of neurons (although these networks relate to the artificial perception more than the artificial intelligence strictly speaking). Research was started again and industry took again some interest with neuronal (in particular for applications like the guidance of cruise missiles). In 1984 (?), it is the system of retropropagation of the gradient of the error which is the subject more discussed in the field.

A revolution occurs then in the field of the networks of artificial neurons: a new generation of networks of neurons, able to treat non-linear phenomena successfully: the multi-layer perceptron does not have the defects highlighted by Marvin Minsky. Proposed for the first time by Werbos, Perceptron Multi-Couche appears in 1986 introduced by Rumelhart, and, simultaneously, under a close name, at Yann Cun. These systems rest on the retropropagation of the gradient of the error in systems with several layers, each one of Adaline type of Bernard Widrow, near to Perceptron of Rumelhart.

The networks of neurons made considerable great strides thereafter, and belonged to the first systems to profit from lighting from the theory from the statistical Régularisation introduced by Vladimir Vapnik in Soviet Union and popularized in Occident since the fall from the wall. This theory, one of most important of the Statistical field of the S, makes it possible to anticipate, study and control the phenomena related to on-training. One can thus control a system of training so that it referee at best between a poor modeling (example: the Average ) and a too rich modeling which would be optimized in an illusory way on a too small number of examples, and would be inoperative on not yet learned examples, even close to the learned examples. On-training is a difficulty to which all the systems of training for the example must face, that those use of the methods of optimization direct (e.g linear Regression), iterative (e.g Descente of gradient), or iterative semi-direct (Gradient combined, hope-maximization…) and that those are applied to the traditional statistical models, with the model of Markov hidden or with the networks of formal neurons.

Utility

The networks of neurons, as a system able to learn, implement the principle of induction, i.e. the training by the experiment. By confrontation with specific situations, they infèrent a system of decision integrated whose generic character is function of the number of cases of trainings met and their complexity compared to the complexity of the problem to be solved. By opposition, the systems symbolic systems capable of training, if they also implement induction, do it on the basis of algorithmic logic, by complexification of a whole of deductive rules (ex: prolog).

Thanks to their capacity of classification and generalization , the networks of neurons are generally used in problems of statistical nature, such as the automatic classification of zip codes or decision making concerning a stock exchange purchase according to the evolution of the courses. Another example, a bank can generate a data file on the customers who carried out a loan made up: of their income, of their age, number of dependant childrens… and if it is about a good customer. If this data file is sufficiently large, it can be used for the drive of a network of neurons. The bank will be able to then introduce to the characteristics of a potential new customer, and the network will answer if he will be good customer or not, while generalizing starting from the cases that he connaît.

If the network of neurons functions with real numbers, the answer translates a probability of certainty (for example: 1 for “sure that he will be a good customer”, -1 for “sure that he will be bad customer”, 0 for “any idea”, 0,9 for “almost sure that he will be good customer”).

It should be noted that the network of neurons always does not provide an exploitable rule by human. The network remains often a block box which provides an answer when a data is presented to him, but the network does not provide a justification easy to interpret.

The networks of neurons are really used, for example:

  • for classification; for example for the classification of animal species per species being given an analysis DNA.
  • Recognition of reason; for example for the Optical character recognition (OCR), and in particular the banks to check the amount of the accounts - checks, La Poste to sort the mail according to the zip code, etc; or even for the automated displacement of autonomous mobile robots.
  • approximation of an unknown function.
  • modeling accelerated of a known but very complex function to calculate with exactitude; for example certain functions of inversions used to decode the signals of teledetection emitted by the satellites and to transform them into data on sea surface.
  • stock exchange estimates:
    • training of the value of a company according to the indices available: benefit, debts with length and short term, turnover, order book, technical information of economic situation. This type of application in general does not pose a problem
    • attempts at prediction on the periodicity of the stock exchange courses. This type of prediction is very disputed for two reasons, one being which it is not obvious that the course of an action has in a completely convincing way a periodic character (the market indeed largely anticipates the rises as foreseeable falls, which applies to any possible periodicity a variation of period tending to make it not easily reliable), and the other that the foreseeable future of a company at least also strongly determines the course of its action, if it is more still only can do it its past; the cases of Side Am, Manufrance or IBM make it possible to be convinced some.
  • modeling of the training and improvement of the techniques of teaching.

Limits

  • the networks of artificial neurons have need for real cases being used as examples for their training (one calls that the bases training ). These cases must be all the more numerous as the problem is complex and that its topology is structured little. For example, one can optimize a neuronal system of reading of characters by using the manual cutting of a great number of words written with the hand by many people. Each character can then be presented in the form of a rough image, having of a space topology with two dimensions, or a succession of dependant segments almost all. Topology selected, the complexity of the modelled phenomenon, and the number of examples must be in report/ratio. On a practical level, that is not always easy because the examples can be is in quantity absolutely limited or too expensive to collect of sufficient number.

  • There are problems which are treated well with the networks of neurons, in particular those of classification in convex fields (i.e. such as so of the points has and B belong to the field, then all segment AB in fact part also).
  • Of the problems like " is the number of entries with 1 (or zero) even or odd? " are solved on the other hand very badly: to affirm such things on 2 power NR points, if one is satisfied with an approach naive but homogeneous, one precisely needs N-1 intermediate layers of neurons, which harms the general information of the process.

Model

Structure of the network

A network of neuron is in general composed of a succession of layers of which each one takes its entries on the exits of the preceding one. each layer (I) is made up of Ni neurons, taking their entries on Ni-1 neurons of the preceding layer. With each Synapse is associated a synaptic weight, so that Ni-1 are multiplied by this weight, then added by the neurons with level I, which is equivalent to multiply the vector of entry by a matrix of transformation. To put one behind the other the various layers of a network of neurons would amount putting in cascade several matrices of transformation and could be brought back to only one matrix, product others, if there were not with each layer, the function of exit which introduces nona linearity with each stage. This shows the importance of the wise choice of a good function of exit: a network of neurons whose exits would be linear would not have any interest.

Beyond this simple structure, the network of neurons can also contain loops which radically change it the possibilities but also complexity. In the same way that loops can transform a combinatory logic into sequential Logique, the loops in a network of neurons transform a simple device of recognition of inputs into a complex machine capable of any kinds of behaviors.

Comparison with the human brain

There is billion Neuron S in a human Cerveau (several sources quote the figure of 100 billion). Although those work in impulse mode (they produce more or less dense trains of impulses of energy fixes), one can coarsely compare them to summoners, each neuron which can receive the inputs of tens or sometimes of hundreds of thousands of other neurons. It is generally estimated that the whole of the human brain would contain about the million billion synapses (2*10^ {15} ), which brings us to a figure moyen of 10.000 synapses by neurons. Each neuron is limited by the need '' to reload its batteries '' after having emitted a potential of action, which makes it inactive during about 10ms, which determines a maximum speed of operation of 100 Hz. what gives us a maximum rate of ~10^ {17} operations a second. However, as apart from the lower limit of 10ms, the differences between impulses are analogical values, and as on another side conveyed information is it in trains of several impulses, it is almost impossible to compare the basic operations of the brain with those of a computer. Like, moreover, the quoted figures are probably extremely variable according to the people, it is necessary to count an important level of uncertainty around this value. The power of an human brain would be thus “with the bench”, if one can say, about 2*10^ {15} with 2*10^ {19} logical operations a second.

A processor of the type Pentium IV, AMD64 or PowerPC 970, in 2004, works at a frequency of 3 GHz on words of 32 (Pentium) or 64 (AMD64 or PowerPC) bits, which - to give an order of magnitude - corresponds to an installed capacity of 2*10^ {11} logical operations a second in the case of PowerPC.

In spite of this differential of power, it is trying to simulate the operation of neurons to solve some simple problems. A reason to be delighted by this simplicity is that to educate correctly a brain of 10^ {11} neurons, one should not all the same less than 25 years, time of which it is difficult to lay out in laboratory.

A network of neurons (one sometimes also speaks about network neuromimetic ) consists of a very great number of small identical treatment units called artificial neurons . They were electronic in the first implementations (will perceptrons of Rosenblatt ); one generally simulates them on computer today for questions of cost and convenience.

The neurobiologists know that each natural neuron is connected sometimes to a few thousands of others, and that it transmits information to them by sending waves of depolarization (roughly speaking, electric peaks ). More precisely, the neuron receives in entry the signals coming from the others by Synapse S, and emits at exit information by its Axone. In a way coarsely similar, the artificial neurons are connected between them by balanced and one-way connections; a network of neurons can thus be represented by a network or Graphe directed whose nodes are the artificial neurons.

The size and the speed of the networks enable them to very correctly treat questions of perception or automatic classification (and approximate): they are often for example applications containing networks of formal neurons which lead your bank to grant to you a loan in less than ten minutes.

One should not hope some much more with the current generation of machines, but the applications are already very useful in the applications of filtering and pattern recognition . In short, it acts more assisted perception that of artificial intelligence strictly speaking. It should be noticed that in our organizations also the treatment of the visual signal by the retina and its exploitation by the brain is done by separate bodies and processes.

Function of combination

Let us consider an unspecified neuron. It receives neurons upstream a certain number of values via its synaptic connections, and it produces a certain value by using a function of combination . This function can thus be formalized as being a function Vecteur - with Scalaire, in particular:

  • the networks of the type MLP (Multi-To bush-hammer Perceptron) calculate a linear combination of the entries, i.e. the function of combination returns the scalar product between the vector of the entries and the vector of the synaptic weights.
  • the networks of the type RBF (Radial Basis Function) calculate the distance between the entries, i.e. the function of combination returns the euclidian norm of the vector resulting from the vectorial difference between the vectors of entries.

Function of activation

The function of activation (or function of thresholding , or transfer function transfer) is used to introduce a non-linearity into the operation of the neuron.

The functions of thresholding generally present three intervals:

  1. in lower part of the threshold, the neuron is inactive (often in this case, its exit is worth 0 or -1);
  2. in the neighborhoods of the threshold, a phase of transition;
  3. above the threshold, the neuron is active (often in this case, its exit is worth 1).

Traditional examples of functions of activation are:

  1. the sigmoid function .
  2. the hyperbolic function tangent.
  3. the function of Heaviside.

The logic bayésienne, whose Théorème of Cox-Jaynes formalizes the questions of training, utilizes also a function in S which returns in a recurring way: ev (p) = 10 \ log (\ frac {p} {1-p})

Propagation of information

This calculation carried out, the neuron propagates its new internal state on its axon. In a simple model, the neuronal function is simply a function of thresholding: it is worth 1 if the balanced sum exceeds a certain threshold; 0 if not. In a richer model, the neuron functions with real numbers (often included/understood in the interval or). It is said that the network of neurons passes from a state to another when all its neurons recompute in parallel their internal state, according to their entries.

Training

Theoretical base

The concept of training , although known already since Sumer, is not modélisable within the framework of the deductive logical : this one indeed proceeds starting from already established knowledge which one draws from derived knowledge. However it is here about the opposite step: by limited observations, to draw from plausible generalizations.

The concept of training recovers two often treated realities in a successive way:

  • memorizing: the fact of assimilating in a dense form of the possibly many examples,
  • generalization: the fact of being able, thanks to the learned examples, of treating distinct examples, still not met, but similar.
These two points are partially in opposition. If one is privileged, one will work out a system which will not treat in a very effective way inevitably the other.

In the case of the systems of statistical training, used to optimize the traditional statistical models, networks of neurons and Markovian automats, it is the generalization which is the object of all the attention.

This concept of generalization is treated in a more or less complete way by several theoretical approaches.

  • generalization is treated in a total and generic way by the theory of the statistical Régularisation introduced by Vladimir Vapnik. This theory, developed at the origin in Soviet Union, was diffused in Occident since the Fall of the Wall. The theory of the statistical Régularisation was diffused very largely among those which study the networks of neurons because of the generic shape of the curves of residual errors of training and of generalization exits of the iterative procedures of training such as the descents of gradient used for the optimization of will perceptrons multi-layer. These generic forms correspond to the forms envisaged by the theory of the regularization statistics; that comes owing to the fact that the procedures from training by descents of gradient, on the basis of an initial configuration of the synaptic weights explore gradually the space of the possible synaptic weights; one then finds the problems of the progressive increase in the capacity of training , concept fundamental in the middle of the theory of the statistical regularization.
  • generalization is also in the middle of the approach of the Inférence bayésienne, taught since longer. The Théorème of Cox-Jaynes thus provides an important base to such a training, by teaching us that any method of training is either isomorphous to the probabilities provided with the relation of Bayes, or incoherent . It is an extremely strong result there, and this is why the methods bayésiennes are largely used in the field.

Classify soluble problems

According to the structure of the network, various types of function are accessible thanks to the networks of neurons:

Representable functions by a perceptron:

A perceptron (a network with a unit) can represent the following Boolean functions: AND, GOLD, NAND, NOR but not the XOR. As any Boolean function is representable using AND, BUT, NAND and NOR, a network of will perceptrons is able to represent all the Boolean functions.

Representable functions by networks of multi-layer neurons acyclic:

  • Boolean Functions: All the Boolean functions are representable by a network with two layers. At worst by the cases, the number of neurons of the hidden layer increases in an exponential way according to the number of entries.

  • continuous Functions: All the limited continuous functions are representable, with an arbitrary precision, by a network with two layers (Cybenko, 1989). This theorem applies to the network whose neurons use the sigmoid in the hidden layer and of the linear neurons (without threshold) in the layer of exit. The number of neurons in the hidden layer depends on the function to approximate.

  • arbitrary Functions: Any function can be approximated with an arbitrary precision thanks to a network to 3 layers (Cybenko, 1988).

Algorithm

The vast majority of the networks of neurons has an algorithm “of drive” which consists in modifying the synaptic weights according to a data file presented in entry of the network. The goal of this drive is to allow the network neurons “to learn” starting from the examples. If the drive is correctly carried out, the network is able to provide answers in exit very close to the values of origins to the data file of drive.

But all the interest of the networks of neurons lies in their capacity at to generalize starting from the test set.

One can thus use a network of neurons to carry out a memory; one then speaks about neuronal memory .

The topological vision of a training corresponds to the determination of the Hypersurface on \ mathbb {R} ^n where \ mathbb {R} is the together realities, and n the number of entries of the network.

Training

Mode supervised or not

A training is known as at the same time supervised when one forces the network to converge towards a precise final state, as a reason is presented to him.

Contrary, at the time of a not-supervised training, the network is left free converge towards any final state when a reason is presented to him.

Surapprentissage

It often happens that the examples of the base of training comprise approximate values or disturbed. If one obliges the network to answer relative in a nearly perfect way these examples, one can obtain a network which is skewed by erroneous values. For example, let us imagine that one presents to the network couples (xi, F (xi)) located on a line of equation y=ax+b, but disturbed so that the points are not exactly on the line. If there is a good training, the network answers ax+b for any value of X presented. If there is surapprentissage, the network answers a little more than ax+b or a little less, because each couple (xi, F (xi) positioned apart from the line will influence the decision. To avoid the surapprentissage, there exists a simple method: it is enough to share the base of examples in 2 subsets. The first is used for the training and 2nd is used for the evaluation of the training. As long as the error obtained on the 2nd unit decreases, one can continue the training, if not one stops.

Rétropropagation

The Rétropropagation consists with rétropropager the error made by a neuron with its synapses and the neurons which are connected there. For the networks of neurons, one uses usually the retropropagation of the gradient of the error, which consists in correcting the errors according to the importance of the elements which precisely took part in the realization of these errors: the synaptic weights which contribute to generate an important error will see modified in a way more significant than the weights which generated a marginal error.

Pruning

Pruning (" pruning" , in English) is a method which makes it possible to avoid the Surapprentissage while limiting the complexity of the model. It consists in removing connections (or synapses), entries or neurons of the network once the finished training. In practice, the elements which have the smallest influence on the error of exit of the network are removed. The two most used algorithms of pruning are:

  • Optimal Brain Ramming (OBD) of Y. LeCun and Al
  • Optimal Brain Sucker (OBS) of B. Hassibi and D.G. Stork

Various types of networks of neurons

The whole of the weights of the synaptic connections determines the operation of the network of neurons. The reasons are presented to a subset of the network of neurons: the layer of entry. When one applies a reason to a network, this one seeks to reach a stable condition. When it is reached, the values of activation of the neurons of exit constitute the result. The neurons which make neither left the layer of entry nor of the layer of exit are known as hidden neurons .

The types of network of neurons differ by several parameters:

  • the Topology of connections between the neurons;

  • the function of aggregation used (balanced sum, pseudo-Euclidean distance…) ;
  • the function of thresholding used (sigmoid, level, linear Function, Function of Gauss,…) ;
  • the algorithm of training (Rétropropagation of the gradient, Cascade correlation);
  • of other parameters, specific to certain types of networks of neurons, such as the method of relieving for the networks of neurons (e.g networks of Hopfield) which are not with simple propagation (e.g. Perceptron Multicouche).

Many other parameters are likely to be implemented within the framework of the training of these networks of neurons for example:

  • method of degradation of weightings (Weight decay), allowing to avoid the effects edge and to neutralize on-training.

Networks with supervised trainings

Without retropropagation

Perceptron
  • Example of perceptron recognizing figures (Java, with very clear illustrations)
Full-course Perceptron
August 1st

Multi-layer Perceptron
August 1st

Adaline (ADAptive LInear NEuron)
In the case of Adaline, one carries out the training by using the exits of the neurons before a passage through the function of activation. So one uses only the balanced sum of the entries with the weights.

Machine of Cauchy
August 1st

Nondetailed
  1. Adaptive Heuristic Critic (AHC)
  2. Time Delay Neural Network (TDNN)
  3. Associative Reward Penalty (ARP)
  4. Avalanche Matched Filter (AMF)
  5. Backpercolation (Perc)
  6. Artmap
  7. Adaptive Logic Network (ALN)
  8. Cascade Correlation (CasCor)
  9. Extended $kalman Filter (EKF)
  10. Learning Vector Quantization (LVQ)
  11. Probabilistic Neural Network (PNN)
  12. General Regression Neural Network (GRNN)

With retropropagation

Nondetailed
  1. Brain-State-in-have-Box (BSB)
  2. Fuzzy Congitive Map (FCM)
  3. Boltzmann Machine (BM)
  4. Mean Field Annealing (MFT)
  5. Recurrent Cascades Correlation (RCC)
  6. Backpropagation through time (BPTT)
  7. recurring Real-time learning (RTRL)
  8. Recurrent Extended $kalman Filter (EKF)

Networks with training not supervised

With retropropagation

Chart adaptive car

Nondetailed
  1. Additive Grossberg (AG)
  2. Shunting Grossberg (SG)
  3. Binary Adaptive Resonance Theory (ART1)
  4. Analog Adaptive Resonance Theory (ART2, ART2a)
  5. Discrete Hopfield (DH)
  6. Continuous Hopfield (CH)
  7. Discrete Bidirectional Associative Memory (BAM)
  8. Temporal Associative Memory (TAM)
  9. Adaptive Bidirectional Associative Memory (ABAM)
  10. competitive Training
In this type of training not supervised, the neurons are in competition to be active. They are at binary exit and it is said that they are active when their exit is worth 1. Whereas in the other rules several exits of neurons can be active simultaneously, in the case of the competitive training, only one neuron is active at a given moment. Each neuron of exit is specialized “to detect” a succession of similar forms and becomes a detector of characteristics then. The function of entry is in this case, H = b-dist (W, X) where B, W and X are respectively the vectors threshold, synaptic weights and entries. The gaining neuron is that for which H is maximum thus if the thresholds are identical, that to which the weights are closest to the entries. The neuron whose exit is maximum will be the winner and its exit will be put at 1 whereas the losers have their exit put at 0. A neuron learns by moving its weights towards the values from the entries which activate it to increase its chances to gain. If a neuron does not answer an entry, no adjustment of weight intervenes. If a neuron gains, a portion of the weights of all the entries is redistributed towards the weights of the active entries. The application of the rule gives the following results (Grossberg):  wij = Lr (xj-wij) if neuron I gains,  wij = 0 if neuron I loses. This rule causes to bring the synaptic vector weight wij closer to the shape of entry xj.

Example: Let us consider two groups of dots of the plan which one wishes to separate in two classes. x1 and x2 are the two entries, w11 and w12 are the weights of the neuron 1 which one can regard as the punctual coordinates `weight of neuron 1' and w21 and w22 are the weights of neuron 2. If the thresholds are null, hi will be the distance between the points to be classified and the points weight. The preceding rule tends to decrease this distance with the sample point when the neuron gains. It must thus make it possible each point weight to position in the middle of a cloud. If the random weights of manner initially are fixed, it may be that one of the neurons positions close to the two clouds and that the other positions far from kind which it never gains. Its weights will be able to never evolve/move whereas those of the other neuron will position it in the middle of the two clouds. The problem of these neurons which one describes as dead can be solved while exploiting the thresholds. Indeed, it is enough to increase the threshold of these neurons so that they start to gain.

Applications: This type of network and the method of training corresponding can be used in analysis of data in order to highlight similarities between certain data.

Random links:Castle of Adhémar | Saint-Cyr-of-Favières | Kongōrikishi | Kunming City Commercial Bank | Large-Rosière-Hottomont