A formal neuron is a mathematical and data-processing representation of a biological Neuron. The formal neuron generally has several entries and an exit which correspond respectively to the dendrites and with the Cône of emergence of the biological neuron (starting point of the Axone). The exiting and inhibiting actions of the Synapse S are represented, most of the time, by numerical coefficients (synaptic weights) associated with the entries. The numerical values of these coefficients are adjusted in a phase of training. In its simplest version, a formal neuron calculates the balanced sum of the received entries, then applies to this value a function of activation, generally nonlinear. The end value obtained is the exit of the neuron.

The formal neuron is the basic unit of the networks of artificial neurons in which it is associated with its similar to calculate arbitrarily complex functions, used for various applications in Artificial intelligence.

Mathematically, the formal neuron is a function with several Variable S and real values .

The formal neuron of McCulloch and Pitts

History

The first mathematical model and data-processing of the biological neuron is proposed by Warren McCulloch and Walter Pitts in 1943. While being based on the properties of the biological neurons known at that time, resulting from neurophysiological and anatomical observations, McCulloch and Pitts propose a simple model of formal neuron. It is about a binary neuron, i.e. whose exit is worth 0 or 1. To calculate this exit, the neuron carries out a balanced sum of its entries (which, as exits of other formal neurons, are worth also 0 or 1) then applies a function of activation to threshold: if the balanced sum exceeds a certain value, the exit of the neuron is 1, if not it is worth 0 (cf following sections).

McCulloch and Pitts studied in fact the analogy between the human brain and the universal data-processing machines. They showed in particular that a network (buckled) made up of the formal neurons of their invention to the same computing power as a Machine of Turing.

In spite of simplicity of this modeling, or perhaps thanks to it, the formal neuron called of McCulloch and Pitts remain today a basic element of the networks of artificial neurons. Many alternatives were proposed, more or less biologically plausible, but being generally based on the concepts invented by the two authors. It is known nevertheless today that this model is only one approximation of the functions filled by the real neuron and, that in no way, it can be useful for a major comprehension of the nervous system.

Mathematical formulation

One considers the general case of a formal neuron with m entered, to which one must thus subject the m numerical sizes (or signals, or stimuli) noted x_1 to x_m. A model of formal neuron is a rule of calculation which makes it possible to associate with the entered m an exit: it is thus a function with m Variable S and real values .

In the model of McCulloch and Pitts, at each entry is associated a synaptic weight, i.e. a noted numerical value of w_1 for the entry 1 until w_m for the entry m. The first operation carried out by the formal neuron consists of a sum of the sizes received in entries, balanced by the synaptic coefficients, i.e. the sum

w_1x_1+\ldots+w_mx_m=\sum_{j=1}^m w_j x_j.

With this size a threshold w_0 is added. The result is then transformed by a nonlinear function of activation (sometimes called function of exit), \ varphi. The exit associated with the entries x_1 with x_m is thus given by

\ varphi \ left (w_0+ \ sum_ {j=1} ^m w_j x_j \ right) ,

that one can simplify in
\ varphi \ left (\ sum_ {j=0} ^m w_j x_j \ right) ,
by adding to the neuron a fictitious entry x_0 fixed at the value 1.

In the formulation of origin of McCulloch and Pitts, the function of activation is the Fonction of Heaviside (function in stair ), whose value is 0 or 1. Sometimes in this case, one prefers to define the exit by the following formula

\ varphi \ left (\ sum_ {j=1} ^m w_j x_j-w_0 \ right) ,

who justifies the name of threshold given to the value w_0. Indeed, if the sum \ sum_ {j=1} ^m w_j x_j exceeds w_0 the exit of the neuron is 1, whereas it is worth 0 in the contrary case: w_0 is thus the threshold of activation of the neuron, if it is considered that exit 0 corresponds to an “extinct” neuron.

Alternatives of the neuron of McCulloch and Pitts

The majority of the formal neurons used currently are alternatives of the neuron of McCulloch and Pitts in which the function of Heaviside is replaced by another function of activation. The most used functions are:
  • the sigmoid function;
  • the hyperbolic tangent function;
  • the Function identity;
  • to a lesser extent, certain linear functions per pieces.
These choices are justified by theoretical and practical considerations resulting from the combination of the formal neurons in a network of formal neurons.

Important properties of the function of activation

The properties of the function of activation influence that of the formal neuron indeed and it is thus important to choose this one well to obtain a useful model in practice.

When the neurons are combined in a network of formal neurons, it is important for example that the function of activation of some of them is not a Polynôme subject limiting the computing power of the network obtained. A caricatural case of limited power corresponds to the use of a linear function of activation , like the function identity: in such a situation the total calculation carried out by the network is him-also linear and it is thus perfectly useless to use several neurons, only one giving of the strictly equivalent results.

However, the functions of the sigmoid type are generally limited. In certain applications, it is important that the exits of the network of neurons are not limited a priori : certain neurons of the network must then use a function of not limited activation. The function identity is generally chosen.

It is as useful in practice as the function of activation presents a certain form of regularity. To calculate the Gradient error made by a network of neurons, at the time of sound training, it is necessary that the function of activation is derivable. To calculate the Matrice hessienne of the error, which is useful for certain analyzes of error, it is necessary that the function of activation is derivable twice. As they generally comprise singular points, the linear functions per pieces are used relatively little in practice.

The sigmoid function

The sigmoid function (also called logistic function), defined by

f_ {sig} (X) = \ frac {1} {1 + e^ {- X}} ,

have the important properties evoked previously (it is not polynomial and is indefinitely continuously derivable). Moreover, a simple property makes it possible to accelerate the calculation of its derivative, which reduces time calculation necessary to the training of a network of neurons. One has indeed

\ frac {D} {dx} f_ {sig} (X) =f_ {sig} (X) \ left (1-f_ {sig} (X) \ right) .

One can thus calculate the derivative of this function in a very effective point of way starting from his value in this point.

Moreover, the sigmoid function is with values in the interval , which makes it possible to interpret the exit of the neuron like a probability. It is also related to the logistic model of Régression and appears naturally when one considers the problem of the optimal separation of two classes of Gaussian distributions with same the matrix of covariance.

However, the numbers with which work the computers make this function difficult to program.

Indeed, e^ {- 60} \ simeq 10^ {- 26} , and with the precision of the numbers with comma of the computers, 1 - 10^ {- 26} = 1, and thus f_ {sig} (60) = 1.

When you try to code a network of neuron while making this error, after training, some is the values that you in entry of your network, you will put will always obtain the same result.

There exists fortunately of many solutions for using all the same f_ {sig} (X) = \ frac {1} {1 + e^ {- X}} in your program, and some of enters are dependant on a language. It very amount using a representation of the floating numbers, more precise than the IEEE754. Sometimes, to standardize the entries between 0 and 1 can be enough to regulate the problem. .

The hyperbolic tangent function

The hyperbolic function Tangent, defined by

th (X) = \ frac {e^ {X} - e^ {- X}} {e^ {X} + e^ {- X}} ,

also is very much used in practice, because it shares with the sigmoid function certain practical characteristics:

  • nonpolynomial
  • indefinitely continuously derivable
  • fast calculation of derived by the formula
\ frac {D} {dx} HT (X) =1- \ left (HT (X) \ right) ^2

One cannot however give him such a clear probabilistic interpretation.

Other formal neurons

The development of the networks of artificial neurons led to the introduction of models different from that of McCulloch and Pitts. The motivation was not better to represent the real neurons, but rather to use the properties of certain mathematical constructions to obtain more effective networks (for example with a simpler training, or using less neurons).

Like the neuron of McCulloch and Pitts, the neurons presented in this section have m entered numerical.

Neuron at radial base

A neuron at radial base is built starting from a function of the same name. Instead of carrying out a balanced sum of its entries, such a neuron compares each entry with a value of reference and produces an all the more large exit (near to 1) that the entries are close to the values of reference.

Each entry x_i is thus associated with a value c_i. The comparison enters the two sets of values is generally made within the meaning of the euclidian norm. More precisely, the neuron starts by calculating the following size

\ left \|\ mathbf {X} - \ mathbf {C} \ right \|= \ sqrt {\ sum_ {j=1} ^ {m} \ left (x_i-c_i \ right) ^2}
It is perfectly possible to use another standard that the euclidian norm, and more generally an unspecified distance, to compare the vectors \ mathbf {X} = \ left (x_1, \ ldots, x_m \ right) and \ mathbf {C} = \ left (c_1, \ ldots, c_m \ right) .

The neuron transforms then the value obtained thanks to a function of activation. Its exit thus is finally given by

\ varphi \ left (\ left \|\ mathbf {X} - \ mathbf {C} \ right \|\ right) .
The function calculated by the neuron is said to radial base because it has a radial symmetry around the point of reference \ mathbf {C} : if one carries out an unspecified rotation around this point, the exit of the neuron remains unchanged.

In practice, it is very current to use a function of Gaussian activation defined by

\ varphi (U) = \ exp \ left (- \ beta u^2 \ right) .
The parameter \ beta can be interpreted like the reverse of the sensitivity of the neuron: the larger it is, the more it is necessary that the entries are close to the values of reference so that the exit of the neuron is close to 1.

Neuron Sigma-pi

A neuron Sigma-pi is obtained by replacing the balanced sum of the entries of the model of McCulloch and Pitts by a balanced sum of products of certain entries.

Let us consider for example the case of two entries. The exit of a neuron of McCulloch and Pitts is written in the form \ varphi \ left (w_0+w_1 x_1+w_2x_2 \ right) , whereas that of a neuron Sigma-pi is given by

\ varphi \ left (w_0+w_1 x_1+w_2x_2+w_ {12} x_1x_2 \ right) ,
the difference thus residing in the presence of the term produced x_1x_2.

In the general case with m entered, one obtains an exit of the following form

\ varphi \ left (w_0+ \ sum_ {i=1} ^pw_i \ prod_ {K \ in I_j} x_k \ right) .
In this formula, p is an unspecified entirety which indicates the number of starting materials used. The I_j unit is subsets of \ {1, \ ldots, m \} which specifies the entries to be multiplied enter they to obtain the term number j. According to this notation, the example given higher for 2 entries corresponds to p=3 (3 terms in the sum), I_1= \ {1 \} (entered 1 used only), I_2= \ {2 \} (entered 2 used only) and I_3= \ {1,2 \} (produced entries 1 and 2).

The formulation used watch which there exist many possibilities to build a neuron Sigma-pi for a given number of entries. This is related to the exponential growth with the number of entries of the number of subsets of entries usable to build a neuron Sigma-pi: there exists indeed 2^m possible combinations (by regarding the empty combination as that corresponding to the threshold w_0). In practice, when m becomes large (for example from 20), it becomes quasi-impossible to use all the possible terms and the products thus should be chosen to be privileged. A traditional solution consists in being restricted with subsets of k entered for a low value of k.

The general formulation of these neurons is also at the origin of the name Sigma-pi which refers to the Greek letters capital Sigma (Σ) and pi (Π) used in mathematics respectively to represent the sum and the product.

Neuronal logical calculation

The neuron of McCulloch and Pitts (in its version of origin) can carry out logical calculations elementary.

Calculable functions

Let us consider for example a formal neuron at two entries which one fixes the synaptic weights at 1. According to the value of the threshold, the neuron carries out a OR logical or a AND logical. One indeed notices that the exit of the neuron is given by
H \ left (x_1+x_2+w_0 \ right) ,
where H indicates the Fonction of Heaviside. According to the values of the entries, there are thus the following results: One distinguishes four cases, according to the value of w_0.

If the w_0 is positive or null, the exit of the neuron is worth always 1 (in agreement with the definition of the function of Heaviside).

If -1 \ Leq w_0<0, the table becomes: and the neuron thus calculates a OR logical.

If -2 \ Leq w_0<-1, the table becomes: and the neuron thus calculates a AND logical.

Lastly, if w_0<-2, the neuron always gives a null result.

By the same type of reasoning, one notes that a neuron at a entry can not have any effect (neuron identity) or carry out a NOT logic.

By linearity of the balanced sum, if one multiplies at the same time the synaptic weights and the threshold of a neuron by a positive number unspecified, the behavior of the neuron is unchanged, and the modification is completely indistinguishable. On the other hand, if one multiplies all by a negative number, the behavior of the neuron is reversed, since the function of activation is increasing.

By combining neurons of McCulloch and Pitts, i.e. by using the exits of unquestionable neurons like entries for other neurons, one can thus carry out any Switching function. When one authorizes moreover connections forming of the loops in the network, one obtains a system with the same power as a machine of Turing.

Limitations

A neuron alone cannot however represent any switching function. The simplest example of noncalculable function is the OR exclusive, because of its nonlinear character.

Random links:Nobuhiro Watsuki | Media Access Control | Shigeru Kanno | Demography of Slovenia | Romuald Peiser