Particle swarm optimization

Particle swarm optimization (PSO) was proposed in 1995 by Dr Eberhart and Dr Kennedy, based on a study of the predatory behaviour of flocks of birds. Its basic core is to make use of the information sharing of individuals in the group so that the movement of the whole group can evolve from disorder to order in the problem solving space, so as to obtain the optimal solution of the problem. Consider the scene: a flock of birds are foraging for food, and there is a field of corn in the distance. None of the birds know exactly where the field is, but they know how far away they are from it. So the best strategy for finding a cornfield, the simplest and most effective strategy, is to search the area around the nearest flock.

In PSO, the solution of each optimization problem is a bird in the search space, called a “particle”, and the optimal solution of the problem corresponds to the “corn field” in the bird flock. All particles have a position vector (the position of the particle in the solution space) and a velocity vector (which determines the direction and speed of the next flight), and the fitness value of the current position can be calculated according to the objective function, which can be understood as the distance from the “corn field”. In each iteration, the examples in the population can learn not only from their own experience (historical location), but also from the “experience” of the optimal particles in the population, so as to determine how to adjust and change the direction and speed of flight in the next iteration. In this way, the whole population of examples will gradually approach the optimal solution.

The above explanation may seem abstract, but a simple example is used to illustrate it

There are two people in a lake who can communicate with each other and can detect the lowest point in their position. The initial position is shown in the picture above, and since the right side is deep, the person on the left will move the boat to the right.

Now it’s deeper on the left, so the person on the right will move the boat a little bit to the left

 

The process is repeated until the two boats meet

A locally optimal solution is obtained

 

Each individual is represented as a particle. The position of each individual at a given time is x(t), and the direction is v(t).

 

 

 

P (t) is the optimal solution of x individual at time t, g(t) is the optimal solution of all individuals at time t, v(t) is the direction of the individual at time t, and x(t) is the position of the individual at time T

 

 

 

The next position is shown above and is determined by x, P and g

 

 

 

The particles in the population can find the optimal solution of the problem by continuously learning from the historical information of themselves and the population.

However, in subsequent studies, the table shows that there is a problem in the original formula above: the update of V in the formula is too random, so that the global optimization ability of the whole PSO algorithm is strong, but the local search ability is poor. In fact, we need PSO to have strong global optimization ability at the early stage of algorithm iteration, while in the later stage of algorithm, the whole population should have stronger local search ability. Therefore, based on the above disadvantages, Shi and Eberhart modified the formula by introducing inertia weight, and thus proposed the inertia weight model of PSO:

 

The components of each vector are represented as follows

 

 

W as PSO inertia weight, it values between [0, 1] interval, general applications adopt adaptive accessor methods, namely the beginning w = 0.9, makes the PSO global optimization ability is stronger, with the deepening of the iteration, diminishing parameter w, so that the PSO with strong local optimization ability, at the end of an iteration, w = 0.1. The parameters C1 and c2 are called learning factors and are generally set to 1,4961. R1 and r2 are random probability values between [0,1].

The algorithm framework of the whole particle swarm optimization algorithm is as follows:

Step1 population initialization, random initialization or specific initialization method can be designed according to the optimized problem, and then the individual adaptive value is calculated to select the local optimal position vector of the individual and the global optimal position vector of the population.

Step2 set iteration: set the iteration number and set the current iteration number to 1

Step3 Speed update: Update the speed vector of each individual

Step4 Position update: Update the position vector of each individual

Step5 update local position and global position vector: update the local optimal solution of each individual and the global optimal solution of the population

Step 6 Judgment of termination conditions: when judging the number of iterations, the maximum number of iterations is reached. If so, output the global optimal solution; otherwise, continue the iteration and jump to Step 3.

The application of particle swarm optimization algorithm is mainly about the design of velocity and position vector iterative operator. The effectiveness of the iterator will determine the performance of the whole PSO algorithm, so how to design the iterator of PSO is the focus and difficulty of the application of PSO algorithm.

LSTM network

Long Short Term Memory, or LSTM as we call it, is specially designed to solve long-standing problems. All RNS have a chain form of repeating neural network modules. In standard RNN, this repeating structural module has only a very simple structure, such as a TANH layer.

Figure 3. RNNcell

The LSTM has the same structure, but the repeated modules have a different structure. Instead of a single neural network layer, there are four, interacting in a very specific way.

Figure 4. LSTMcell

Core ideas of LSTM

The key to LSTM is the state of the cell as a whole (the green diagram represents a cell), and the horizontal line across the cell.

The cellular state is like a conveyor belt. It runs directly along the chain, with just a few linear interactions. It would be easy for the message to circulate and stay the same.

Figure 5.LSTMcell internal structure diagram

There is no way to add or remove information with only the top horizontal line. They do it through a structure called gates.

Gates can selectively allow information to pass through, mainly through a neural layer of sigmoid and a point-by-point multiplication operation.

Figure 6. Information node

Each element of the sigmoID layer output (which is a vector) is a real number between 0 and 1, representing the weight (or ratio) to let the corresponding message through. For example, 0 means “let no information pass” and 1 means “let all information pass”.

LSTM realizes information protection and control through three basic structures. These three gates are input gate, forget gate and output gate.

In-depth understanding of LSTM

Forget the door

The first step in our LSTM is to decide what information we will discard from the cellular state. This decision is made through a layer called the forgetgate. The gate reads ht−1 H_ {t−1} HT −1 and X T X_txt, and outputs a value between 0 and 1 for each number in the cell state C T −1 C_{T-1}Ct−1. 1 indicates completely reserved, 0 indicates completely discarded.

Ht −1 represents the output of the previous cell, and XT represents the input of the current cell. σσ represents the sigmod function.

Enter the door

The next step is to decide how much new information to add to the cell state. Doing this involves two steps: first, a Sigmoid layer called the input Gate Layer determines which information needs to be updated; A TANH layer generates a vector, which is the alternative to update, C^ T. In the next step, we join these two parts together to make an update to the cell’s state.

It is now time to update the old cell state. Ct−1 is updated to Ct. The previous steps have determined what will be done, and we are now going to actually do it.

We multiply the old state times ft, discarding the information we know we need to discard. And then I’m going to add it∗C~t. This is the new candidate value, varying according to how much we decide to update each state.

Output the door

Finally, we need to decide what value to output. This output will be based on our cell state, but also a filtered version. First, we run a sigmoID layer to determine which part of the cell state will be output. Next, we process the cell state through TANH (to get a value between -1 and 1) and multiply it by the output of the Sigmoid gate. Finally, we will only output what we are sure to output.

implementation

%%%% Optimized LSTM prediction single sequence CLC clear All Close all % loaded data, reconstructed into row vector % loaded data, reconstructed into row vector data =xlsread(' Typhoon Day data 2','Sheet1','B2:E481'); % Assign your load data to the data variable. NumTimeStepsTrain =round(0.8*size(data,1)); dataTrain = data(1:numTimeStepsTrain,:)'; dataTest = data(numTimeStepsTrain+1:end-1,:)'; numTimeStepsTest = size(data,1)-numTimeStepsTrain-1 ; DataTrainStandardized, ps_input = mapminmax(dataTrain,0,1); [dataTestStandardized, ps_output] = mapminmax (dataTest, 0, 1); XTrain = dataTrainStandardized(2:end,:); YTrain = dataTrainStandardized(1,:); [dataTrainStandardized_zong, ps_input_zong] = mapminmax (data ', 0, 1); XTest_zong = dataTrainStandardized_zong(2:end,end); % test set type YTest_zong = dataTest(1,1)'; % test set output XTest = dataTestStandardized(2:end,:); Enter YTest = dataTest(1,:)'; Create LSTM regression network, specify LSTM layer hidden cell number 96*3 % sequence prediction, therefore, input one-dimensional, output one-dimensional numFeatures = 3; % input layer numResponses = 1; NumHiddenUnits = 20*3; layers = [ ... sequenceInputLayer(numFeatures) lstmLayer(numHiddenUnits) fullyConnectedLayer(numResponses) regressionLayer]; %% initial population N = 5; % Initial population d = 1; % space dimension ger =100; % maximum iteration limit = [0.001, 0.01;] ; % set position parameter limit (matrix can be multi-dimensional) vlimit = [-0.005, 0.005;] ; % Set speed limit C_1 = 0.8; % inertia weight C_2 = 0.5; % self-learning factor C_3 = 0.5; Group learning factor for I = 1: % d x (:, I) = limit (I, 1) + (limit (I, 2) - limit (I, 1)) * rand (N, 1); End v = 0.005*rand(N, d); % Initial population velocity xm = x; Ym = zeros(1, d); % species historical optimum position FXM = 1000*ones(N, 1); % Historical best fitness fyM = 1000; % population history optimum fitness %% pSO operation iter = 1; times = 1; record = zeros(ger, 1); % logger while iter <= ger iter for I =1:N % specifies training options, solver set to Adam, 250 rounds of training. % Gradient threshold is set to 1. End disp([' min: ',num2str(FYM)]); Disp ([' variable value: ',num2str(ym)]); Figure plot(record) title([' Error Cost curve after particle swarm optimization, best learning rate =',num2str(YM)]); figure plot(record) title([' Error Cost curve after particle swarm optimization, best learning rate =',num2str(YM)]); Figure (2,1,1) % subplot(YTest,'gs-','LineWidth',2) hold on Plot (YPred_best,'ro-','LineWidth',2) hold off legend(' observed value ',' predicted value ') xlabel(' time ') ylabel(' data value ') title(' PSO optimization prediction effect ')Copy the code