1. Introduction to Elman neural network

Characteristics of 1.

Elman neural network is a kind of typical dynamic recursive neural network, which is based on the basic structure of BP network, in the add a layer, hidden layer as a step delay operator, achieve the purpose of memory, which make the system have the ability to adapt to the time-varying characteristics, to enhance the global stability of the network, it is better than type feedforward neural network has more computing power, It can also be used to solve the problem of fast optimization.

Structure of 2.

Elman neural network is a typical feedback neural network model which is widely used. Generally divided into four layers: input layer, hidden layer, undertake layer and output layer. The connection of input layer, hidden layer and output layer is similar to feedforward network. The input layer element only plays the role of signal transmission, the output layer element plays the role of weighting. Hidden layer element has linear and nonlinear excitation function, usually Signmoid nonlinear function. The continuity layer is used to remember the previous output value of hidden layer element, which can be regarded as a delay operator with one-step delay. The output of the hidden layer connects itself to the input of the hidden layer through the delay and storage of the layer, which makes it sensitive to the historical data. The addition of the internal feedback network increases the ability of the network itself to deal with dynamic information, so as to achieve the purpose of dynamic modeling. Its structure is shown in Figure 1 below.



The mathematical expression of its network is as follows:







Where, y is the m-dimension output node vector; X is the node element vector of n-dimensional middle layer; U is the r-dimensional input vector;Is n-dimensional feedback state vector;Is the connection weight between the middle layer and the output layer;Is the connection weight from the input layer to the middle layer;Is the connection weight from the undertaking layer to the middle layer; G () is the transfer function of the output neuron, which is a linear combination of the outputs of the middle layer. F () is the transfer function of neurons in the middle layer, and S function is often used.

3. Differences with BP network

It is a dynamic feedback network, which can internally feedback, store and use the output information of the past time. It can not only realize the modeling of static system, but also realize the mapping of dynamic system and directly reflect the dynamic characteristics of the system. It is better than BP neural network in terms of computing power and network stability.

4. The drawback

Like BP neural network, the algorithm is based on gradient descent method, which will have the disadvantages of slow training speed and easy to fall into the local minimum point, and the training of neural network is difficult to achieve the global optimal.

Second, particle swarm optimization

Particle swarm optimization (PSO) was proposed in 1995 by Dr Eberhart and Dr Kennedy, based on a study of the predatory behaviour of flocks of birds. Its basic core is to make use of the information sharing of individuals in the group so that the movement of the whole group can evolve from disorder to order in the problem solving space, so as to obtain the optimal solution of the problem. Consider the scene: a flock of birds are foraging for food, and there is a field of corn in the distance. None of the birds know exactly where the field is, but they know how far away they are from it. So the best strategy for finding a cornfield, the simplest and most effective strategy, is to search the area around the nearest flock.

In PSO, the solution of each optimization problem is a bird in the search space, called a “particle”, and the optimal solution of the problem corresponds to the “corn field” in the bird flock. All particles have a position vector (the position of the particle in the solution space) and a velocity vector (which determines the direction and speed of the next flight), and the fitness value of the current position can be calculated according to the objective function, which can be understood as the distance from the “corn field”. In each iteration, the examples in the population can learn not only from their own experience (historical location), but also from the “experience” of the optimal particles in the population, so as to determine how to adjust and change the direction and speed of flight in the next iteration. In this way, the whole population of examples will gradually approach the optimal solution.

The above explanation may seem abstract, but a simple example is used to illustrate it

There are two people in a lake who can communicate with each other and can detect the lowest point in their position. The initial position is shown in the picture above, and since the right side is deep, the person on the left will move the boat to the right.

Now it’s deeper on the left, so the person on the right will move the boat a little bit to the left

The process is repeated until the two boats meet

A locally optimal solution is obtained

Each individual is represented as a particle. The position of each individual at a given time is x(t), and the direction is v(t).

P (t) is the optimal solution of x individual at time t, g(t) is the optimal solution of all individuals at time t, v(t) is the direction of the individual at time t, and x(t) is the position of the individual at time T

The next position is shown above and is determined by x, P and g

The particles in the population can find the optimal solution of the problem by continuously learning from the historical information of themselves and the population.

However, in subsequent studies, the table shows that there is a problem in the original formula above: the update of V in the formula is too random, so that the global optimization ability of the whole PSO algorithm is strong, but the local search ability is poor. In fact, we need PSO to have strong global optimization ability at the early stage of algorithm iteration, while in the later stage of algorithm, the whole population should have stronger local search ability. Therefore, based on the above disadvantages, Shi and Eberhart modified the formula by introducing inertia weight, and thus proposed the inertia weight model of PSO:

The components of each vector are represented as follows

W as PSO inertia weight, it values between [0, 1] interval, general applications adopt adaptive accessor methods, namely the beginning w = 0.9, makes the PSO global optimization ability is stronger, with the deepening of the iteration, diminishing parameter w, so that the PSO with strong local optimization ability, at the end of an iteration, w = 0.1. The parameters C1 and c2 are called learning factors and are generally set to 1,4961. R1 and r2 are random probability values between [0,1].

The algorithm framework of the whole particle swarm optimization algorithm is as follows:

Step1 population initialization, random initialization or specific initialization method can be designed according to the optimized problem, and then the individual adaptive value is calculated to select the local optimal position vector of the individual and the global optimal position vector of the population.

Step2 set iteration: set the iteration number and set the current iteration number to 1

Step3 Speed update: Update the speed vector of each individual

Step4 Position update: Update the position vector of each individual

Step5 update local position and global position vector: update the local optimal solution of each individual and the global optimal solution of the population

Step 6 Judgment of termination conditions: when judging the number of iterations, the maximum number of iterations is reached. If so, output the global optimal solution; otherwise, continue the iteration and jump to Step 3.

The application of particle swarm optimization algorithm is mainly about the design of velocity and position vector iterative operator. The effectiveness of the iterator will determine the performance of the whole PSO algorithm, so how to design the iterator of PSO is the focus and difficulty of the application of PSO algorithm.

3. Algorithm flow

Step 1: Input influencing factor data and target output data, divide training set and test set by ELMAN neural network, normalize data processing,

Step 2: Construct the ELMAN neural network and initialize the network structure.

Step 3: Particle swarm optimization parameter initialization. Initialize the maximum number of iterations N, population size N, and c1, c2, w parameters.

Step 4: Initialize the population position of PSO. According to the network structure in Step 2, calculate the number of variable elements to be optimized.

Step 5: Use particle swarm optimization. The fitness function is set as the mean square error of ELMAN’s prediction. The cyclic body process of PSO, that is, velocity update and position update, is executed until the maximum number of iterations is reached, and the particle swarm optimization algorithm is terminated.

Step 6: Assign weight and threshold parameters optimized by PSO to ELMAN neural network. (Or in the loop body, the network structure is understood as an optimization variable and the optimal network structure is output).

Step 7: ELMAN neural network training and prediction after PSO optimization, and prediction error analysis and comparison with ELMAN neural network before optimization.

Four, demo code

clc; clear; close all; %% initial population N = 500; % initial population d = 24; % space dimension ger = 300; Set the maximum number of iterations % % location parameter limit (matrix can be in the form of multidimensional) vlimit = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5;] ; % Set speed limit C_1 = 0.8; % inertia weight C_2 = 0.5; % self-learning factor C_3 = 0.5; Group learning factor for I = 1: % d x (:, I) = limit (I, 1) + (limit (I, 2) - limit (I, 1)) * rand (N, 1); End v = 0.5*rand(N, d); % Initial population velocity xm = x; Ym = zeros(1, d); % species historical best position FXM = 100000*ones(N, 1); % Historical best fitness fyM = 10000; % population history optimum fitness %% pSO operation iter = 1; times = 1; record = zeros(ger, 1); While iter <= ger for I =1:N fx(I) = calfit(x(I,:)); End for I = 1:N if FXM (I) > fx(I) FXM (I) = fx(I); % update individual history best fitness xm(I,:) = x(I,:); End end if fym > min(FXM) [fym, nmax] = min(FXM); Ym = xm(nmax, :); End v = v * c_1 + c_2 * rand *(xm-x) + c_3 * rand *(repmat(ym, N, 1) -x); For I =1:d for j=1:N if v(j, I)>vlimit(I,2) v(j, I)=vlimit(I,2); end if v(j,i) < vlimit(i,1) v(j,i)=vlimit(i,1); end end end x = x + v; For I =1:d for j=1:N if x(j, I)>limit(I,2) x(j, I)=limit(I,2); end if x(j,i) < limit(i,1) x(j,i)=limit(i,1); end end end record(iter) = fym; % Max value iter = iter+1; times=times+1; End disp([' min: ',num2str(fym)]); Disp ([' variable value: ',num2str(ym)]); Figure plot(record) xlabel(' number of iterations '); Ylabel (' fitness value ')Copy the code

5. Simulation results

Evolution curve of particle swarm optimization



Comparison of error between PSO-Elman prediction and ELMAN prediction before optimization



Vi. References and codes private message bloggers

Elman Model and analysis of dump Settlement Prediction _ A Case study of Liwu Dump in Dabaoshan Mine _ Ning Zhijie