A brief introduction of BP neural network prediction algorithm

Note: Section 1.1 mainly summarizes and helps to understand the principle of BP neural network algorithm considering the influence factors, that is, the conventional TRAINING principle of BP model is explained (whether to skip according to their own knowledge). Section 1.2 begins with the BP neural network prediction model based on the influence of historical values.

When BP neural network is used for prediction, there are mainly two types of models from the perspective of input indexes to be considered:

1.1 BP neural network algorithm principle affected by relevant indicators

As shown in Figure 1, when BP is trained with the newff function of MATLAB, it can be seen that most cases are three-layer neural networks (namely, input layer, hidden layer and output layer). 1) Input layer: the input layer is equivalent to the human five senses. The five senses obtain external information, which corresponds to the input port of the neural network model in the process of receiving input data. 2) Hidden Layer: corresponding to the human brain, the brain analyzes and thinks about the data transmitted by the five senses. The hiddenLayer of the neural network maps the data x transmitted by the input Layer, which can be simply understood as a formula hiddenLayer_output=F(W *x+ B). Where w and b are weight and threshold parameters, F() is mapping rule, also called activation function, and hiddenLayer_output is the output value of the hidden layer for the transmitted data mapping. In other words, the hidden layer maps the input influence factor data X to produce the mapped value. 3) Output layer: it can correspond to human limbs. After thinking about the information from the five senses (hidden layer mapping), the brain controls the limbs to perform actions (responding externally). Similarly, output layer of BP neural network maps hiddenLayer_output again, outputLayer_output= W *hiddenLayer_output+ B. Where w and B are weight and threshold parameters, and outputLayer_output is the output value (also called simulation value and predicted value) of the neural network output layer (understood as the external execution action of human brain, such as the baby tapping the table). 4) Gradient descent algorithm: by calculating the deviation between outputLayer_output and the y value passed in by the neural network model, the algorithm is used to adjust parameters such as weight and threshold accordingly. This process, you can think of it as the baby slaps the table, misses it, adjusts its body depending on how far it misses so that the arm that is swinging again gets closer and closer to the table and hits.

Here’s another example to deepen your understanding:

The BP neural network shown in Figure 1 has an input layer, a hidden layer and an output layer. How does BP realize the output value outputLayer_output of the output layer through the three-layer structure, constantly approaching the given Y value, so as to obtain an accurate model by training?

From the ports strung together in the picture, one can think of a process: taking the subway. Imagine figure 1 as a subway line. One day wang went home by subway: Get on the bus at the input starting station, pass through many stations (hiddenLayer), and then find that the seat is too far (outputLayer corresponds to the current position), then Wang xx will be based on the distance from home (Target) (Error) of the current position, Return to the hiddenLayer and take the subway again (error reverse transmission, using the gradient descent algorithm to update w and b). If wang makes a mistake again, the adjustment process will be carried out again.

From the example of baby beating the table and Wang taking the subway, consider the problem: the complete training of BP needs to first input data to input, and then through the mapping of the hidden layer, the output layer gets the BP simulation value. According to the error between the simulation value and the target value, adjust the parameters, so that the simulation value constantly approaches the target value. For example, (1) infants react to external interference factors (X) and thus predict. The brain continuously adjusts the position of arms and controls the accuracy of limbs (Y and Target). (2) Wang got on the bus (X), passed through the station (predict), and kept returning to the halfway station to adjust his position and arrived home (Y and Target).

In these links, influencing factor data X and Target value data Y (Target) are involved. According to x and y, BP algorithm is used to find the rule between X and Y, and x is mapped to approximate Y. This is the role of BP neural network algorithm. One more word, all the processes mentioned above are BP model training, so though the model finally obtained is accurate in training, is the BP network found accurate and reliable? Then, we put X1 into the trained BP network to obtain the corresponding BP output value (predicted value) predicT1. By drawing and calculating the indicators such as Mse, Mape and R square, we can compare the closeness of predicT1 and Y1, so as to know whether the prediction of the model is accurate. This is the testing process of BP model, that is, to realize the prediction of data and verify the accuracy of the prediction by comparing with the actual value.



FIG. 1 structure diagram of 3-layer BP neural network

1.2 BP neural network based on the influence of historical values

Taking the power load forecasting problem as an example, the two models are distinguished. When predicting the power load within a certain period of time:

One way is to predict the load value at time T by considering the climatic factors at time T, such as the influence of air humidity X1, temperature X2 and holidays X3, etc. This is the model described in 1.1 above.

Another approach is to think that the change of power load value is related to time. For example, the power load value at t-1, T-2 and T-3 is related to the load value at t, which satisfies the formula Y (t)=F(y(t-1), Y (t-2),y(t-3)). When BP neural network is used to train the model, the influencing factor values input into the neural network are historical load values Y (t-1), Y (T-2),y(t-3). In particular, 3 is called autoregressive order or delay. The output value given to the target in the neural network is y(t).

Second, the gray Wolf algorithm

1 introduction:

Grey Wolf Optimizer (GWO) is a population intelligent optimization algorithm proposed by Mirjalili et al from Griffith University in Australia in 2014. This algorithm is an optimization search method developed by the gray Wolf predation activity, and it has strong convergence performance, few parameters, easy to implement and so on. In recent years, it has been widely concerned by scholars, and has been successfully applied to shop scheduling, parameter optimization, image classification and other fields.

2 algorithm principle:

Gray wolves belong to the gregarious canid family and are at the top of the food chain. Gray wolves adhere to a rigid social dominance hierarchy. As shown in figure:

** The first layer of the social hierarchy: ** the leader of the pack is marked as.Wolves are mainly responsible for making decisions about activities such as feeding, roosting, and sleeping time. Because the other wolves need obedienceWolf’s orders, soWolves are also known as dominant wolves. In addition,The Wolf is not necessarily the strongest Wolf of the pack, but in terms of management ability,The Wolf must be the best.

The second tier of the social hierarchy:The Wolf, it obeysWolf, and assistWolves make decisions. inWhen wolves die or grow old,The Wolf will becomeThe best candidate for Wolf. althoughThe Wolf to obeyThe Wolf, butWolves have control over wolves in other social hierarchies.

The third tier of the social hierarchy:The Wolf, he obeyed​ 、Wolves, who also dominate the rest of the hierarchy.Wolves are generally composed of juvenile wolves, sentinel wolves, hunting wolves, old wolves and nursing wolves.

The fourth rung of the social hierarchy:Wolves, which usually need to obey other wolves in the social hierarchy. Although it may seemWolves don’t play much of a role in the pack, but if notThe existence of wolves, wolves will have internal problems such as cannibalism.

The GWO optimization process includes social stratification, tracking, encircling, and attacking of gray wolves, as described below.

1) Social hierarchyWhen designing A GWO, the first step is to build a gray Wolf Social Hierarchy model. The fitness of each individual in the population was calculated, and the three gray wolves with the best fitness were marked as,​ 、And the rest of the wolves were labeled. In other words, the social hierarchy of the gray Wolf group is from high to low;,​ 、​ 及 . The optimization process of GWO mainly consists of the best three solutions in each generation population (i.e,​ 、) to guide completion.

When gray wolves search for Prey, they gradually approach the Prey and encircle it. The mathematical model of this behavior is as follows:

Where, t is the current iteration number:. Represents the Hadamard product operation; A and C are synergy coefficient vectors; Xp represents the position vector of prey; X(t) represents the position vector of the current gray Wolf; During the entire iteration, a decreases linearly from 2 to 0; R1 and r2 are random vectors in [0,1].

3) Hunring

Gray wolves have the ability to identify the location of potential prey (the optimal solution), and the search process depends on,​ 、Gray Wolf’s guide to complete. However, the solution space characteristics of many problems are unknown, and gray wolves cannot determine the exact location of prey (optimal solution). In order to simulate the search behavior of the gray Wolf (candidate solution), the hypothesis is given,​ 、Strong ability to identify the location of potential prey. Therefore, during each iteration, the best three wolves in the current population (,​ 、), and update other search agents based on their location information (including). The mathematical model of this behavior can be expressed as follows:

Type:,,Respectively represent the current population,​ 、Position vector of; X represents the position vector of the gray Wolf;,,Respectively represent the distance between the current candidate gray wolves and the optimal three wolves; When | A | > 1, gray wolves try to disperse in different areas and search for prey. When | A | < 1, the Wolf will focus its search on one or A few areas of prey.

As can be seen from the figure, the position of the candidate solution finally falls at,​ 、Define random circle position inside. In general,,​ 、The approximate position of the prey (lurking in the optimal solution) was first predicted, and then other candidate wolves randomly updated their positions near the prey under the guidance of the current optimal blue Wolf.

In the process of constructing the Attacking Prey model, the decrease of a value will cause the fluctuation of A value according to the formula in 2). In other words, A is A random vector on the interval [-a, A] ** (note: in the first paper of the original author, it is [-2a,2a], A, A]) **, where A decreases linearly during the iteration. When A is in the range [-1, 1], the next position of the Search Agent can be anywhere between the current gray Wolf and the prey.

5) Hunt for prey(Search for Prey) Gray wolves mainly rely on,​ 、To find prey. They start dispersing to search for information about the location of prey, and then focus on attacking prey. For the establishment of decentralized model, the search agent can be removed from the prey by | A | > 1. This search method enables GWO to conduct global search. Another search coefficient in the GWO algorithm is C. It can be seen from the formula in 2) that C vector is a vector composed of random values in the interval range [0,2], and this coefficient provides random weight for prey to increase (| C | > 1) or decrease (| C | < 1). This helps GWO exhibit random search behavior during optimization to avoid the algorithm falling into local optimality. It is worth noting that C does not decrease linearly. C is a random value in the process of iteration, and this coefficient helps the algorithm to jump out of the local area, especially the algorithm becomes particularly important in the later stage of iteration.

Three, part of the code

% Grey Wolf Optimizer function [Alpha_score,Alpha_pos,Convergence_curve]=GWO(SearchAgents_no,Max_iter,lb,ub,dim,fobj) % initialize alpha, beta, and delta_pos Alpha_pos=zeros(1,dim); Alpha_score=inf; %change this to -inf for maximization problems Beta_pos=zeros(1,dim); Beta_score=inf; %change this to -inf for maximization problems Delta_pos=zeros(1,dim); Delta_score=inf; %change this to -inf for maximization problems %Initialize the positions of search agents Positions=initialization(SearchAgents_no,dim,ub,lb); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%TRANSFORM HERE BY EQ1%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Convergence_curve=zeros(1,Max_iter); l=0; % Loop counter %%%%%%%%%%%%%%%%%%%%%%%%%%%%EVALUAGE J HERE F? RST FOR ALL X? EQ2%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Main loop STEP 3 while l<Max_iter for i=1:size(Positions,1) % Return back  the search agents that go beyond the boundaries of the search space Flag4ub=Positions(i,:)>ub; Flag4lb=Positions(i,:)<lb; Positions(i,:)=(Positions(i,:).*(~(Flag4ub+Flag4lb)))+ub.*Flag4ub+lb.*Flag4lb; % Calculate objective function for each search agent fitness=fobj(Positions(i,:)); % Update Alpha, Beta, and Delta if fitness<Alpha_score Alpha_score=fitness; % Update alpha Alpha_pos=Positions(i,:); end if fitness>Alpha_score && fitness<Beta_score Beta_score=fitness; % Update beta Beta_pos=Positions(i,:); end if fitness>Alpha_score && fitness>Beta_score && fitness<Delta_score Delta_score=fitness; % Update delta Delta_pos=Positions(i,:); end end a=2-l*((2)/Max_iter); % a decreases linearly fron 2 to 0 % Update the Position of search agents including omegas for i=1:size(Positions,1) for  j=1:size(Positions,2) r1=rand(); % r1 is a random number in [0,1] r2=rand(); % r2 is a random number in [0,1] A1=2*a*r1-a; % r2 is a random number in [0,1] A1=2*a*r1-a; C1 = 2 * r2% Equation (3.3); D_alpha = % Equation (3.4) abs (C1 * Alpha_pos (j) - Positions (I, j)); % Equation (3.5)-part 1 X1=Alpha_pos(j) -a1 *D_alpha; % Equation (3.6)-part 1 r1=rand(); r2=rand(); A2=2*a*r1-a; C2 r2 = 2 * % Equation (3.3); D_beta = % Equation (3.4) abs (C2 * Beta_pos (j) - Positions (I, j)); % Equation (3.5)-part 2 X2=Beta_pos(j)-A2*D_beta; % Equation (3.6)-part 2 r1=rand(); r2=rand(); A3=2*a*r1-a; C3 r2 = 2 * % Equation (3.3); D_delta = % Equation (3.4) abs (C3 * Delta_pos (j) - Positions (I, j)); % Equation (3.5)-part 3 X3=Delta_pos(j)-A3*D_delta; % Equation (3)- Part 3 Positions(I,j)=(X1+X2+X3)/3; End end % Equation (3.7) % % % % % % % % % % % % % % % % % % % % % % % % % % % % EVALUAGE J HERE F? RST FOR ALL X? EQ2%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% l=l+1; Convergence_curve(l)=Alpha_score; endCopy the code

4. Simulation results

FIG. 2 Convergence curve of gray Wolf algorithm

The following table shows the test statistics

The test results

Test set accuracy rate

Correct rate of training set

BP neural network

100%

95%

GWO-BP

100%

99.8%

Five, reference and code private message blogger

Forecasting of Water Resources Demand in Ningxia Based on BP Neural Network