1. Introduction of Firefly Optimization Algorithm (FA)

1 Introduction There are many kinds of firefly, mainly distributed in tropical areas. Most fireflies produce rhythmic flashes for a short time. This flash is due to a chemical reaction called bioluminescence, and the pattern of fireflies’ flashes varies from species to species. Firefly Algorithm (FA) is based on firefly flash behavior. It is an intelligent random algorithm for global optimization problems proposed by Yang Xin-She (2009) [1]. Fireflies glow through bioluminescence, a chemical reaction in their underbelly. This bioluminescence is an important part of firefly courtship rituals and is the main medium for male and female communication. The glow can also be used to lure a mate or prey. The flash also helps protect the firefly’s territory and warns predators to stay away from the habitat. In FA, it is thought that all fireflies are hermaphrodite and are attracted to each other regardless of sex. The algorithm is based on two key concepts: the intensity of the light emitted and the degree of attraction between the two fireflies.

Natural firefly behavior Natural firefly displays amazing flashing behavior when it comes to finding prey, attracting mates, and protecting territory. Fireflies mostly live in tropical environments. Generally, they produce cold light, such as green, yellow or reddish. The attraction of a firefly depends on its light intensity, and for any pair, the brighter firefly will attract the other. So, the less bright individual moves toward the brighter one, and the brightness of the light decreases with increasing distance. Firefly flash patterns can vary from species to species, and in some species, females use the phenomenon to prey on other species; Some fireflies exhibit synchronized flashing behavior in a large group to attract prey. Females observe the flashes of males from a stationary position. When they find an interesting flash, they respond by flashing, and the courtship ritual begins. Some females produce the flash patterns of other species to trap males and eat them.

3 Firefly AlgorithmThe firefly algorithm simulates the natural phenomena of fireflies. Real fireflies naturally exhibit a discrete flickering pattern, and firefly algorithms assume that they are always glowing. In order to simulate this flashing behavior of fireflies, Yang Xin-she proposed three rules (Yang, 2009) : (1) Assume that all fireflies are hermaphroditic, so a firefly may be attracted to any other firefly. (2) The brightness of fireflies determines their attractiveness, with brighter fireflies attracting darker ones. If no firefly is brighter than the firefly in question, it will move randomly. (3) The optimal value of the function is directly proportional to the brightness of fireflies. The light intensity (I) and the distance from the light source (r) obey the inverse square law. Therefore, due to the absorption of air, the light intensity (I) decreases with the increase of the distance from the light source. This phenomenon limits the visibility of fireflies to a very limited radius:The main implementation steps of firefly algorithm are as follows:Where I0 is the light intensity (brightest) at distance r=0, that is, its own brightness, which is related to the value of the objective function. The better the target value is, the brighter the brightness is. γ is the absorption coefficient, because fluorescence will gradually weaken with the increase of distance and the absorption of the transmission media, so the light intensity absorption coefficient is set to reflect this characteristic, which can be set as a constant. R is the distance between the two fireflies. Monotonically decreasing functions are sometimes used, as shown in the following formula.The second step is population initialization:Where t represents algebra, xt represents the current position of the individual, β0exp(-γ R2) is the attraction, and αε is the random term. The next step is to calculate the attraction between fireflies:β0 represents the maximum attraction at r=0. Next, the low-brightness firefly moves toward the brighter firefly:In the final stage, the light intensity is updated and all fireflies are ranked to determine the best solution for the moment. The main steps of firefly algorithm are as follows.

Begin Initialization algorithm basic parameters: set the number of fireflies n and the maximum attraction β0, light intensity absorption coefficient γ, step size factor α, maximum iteration times MaxGeneration or search accuracy ε; Initialization: Randomly initialize firefly positions and calculate firefly objective function values as their maximum fluorescence brightness I0; t=1
	whilePrecision (t < = MaxGeneration | | > epsilon) the calculation of relative brightness of fireflies in group I (type2) and attractiveness β (eq5), determine the movement direction of fireflies according to relative brightness; Update the spatial position of fireflies, and move the fireflies in the best position randomly (equation6); According to the updated firefly position, the brightness of firefly I0 was recalculated. t=t+1
	end whileOutput global extremum and optimal individual values. endCopy the code

Firefly algorithm has some similarities with particle swarm optimization (PSO) and bacterial foraging algorithm (BFA). In the position update equation, both FA and PSO have two main components: one is deterministic and the other is random. In FA, attractiveness is determined by two components: objective function and distance, while in BFA, attractiveness between bacteria also has two components: fitness and distance. When firefly algorithm is implemented, the whole population (such as N) needs two inner cycles, and a specific iteration needs one outer cycle (such as I), so the computational complexity of FA is O(n2I) in the worst case.

2. Introduction of BP neural network

BP (Back Propagation) neural network was proposed by Rumelhart and McCelland in 1986. See their paper Learning Representations by Back-propagating Errors published in Nature. BP neural network is one of the most widely used neural network models. BP network can learn and store a large number of input-output mode mappings without revealing the mathematical equations describing the mappings in advance. Its learning rule is to use the fastest descent method, through back propagation to constantly adjust the weight and threshold of the network, so as to minimize the sum of squares of error of the network.

Last time we said that the multilayer perceptron has a bottleneck in how to obtain the weights of hidden layers. Since we can’t get the weight of the hidden layer directly, can we adjust the weight of the hidden layer indirectly by getting the error of the output result and the expected output of the output layer first? BP algorithm is designed with this idea. Its basic idea is that the learning process is composed of two processes: signal forward propagation and error back propagation. In the forward propagation, the input sample is introduced from the input layer, and then transmitted to the output layer after being processed layer by layer by layer. If the actual output of the output layer is not consistent with the desired output (teacher signal), then the back propagation stage of the error is entered. In reverse transmission, the output is transmitted back to the input layer layer by layer through the hidden layer in some form, and the error is apportioned to all the units of each layer, so as to obtain the error signal of each unit, which is used as the basis for correcting the weight of each unit. The detailed flow of these two processes will be described later.

The signal flow diagram of BP algorithm is shown in the figure below 3. Analysis of BP network characteristics — BP three elementsWhen we analyze an ANN, we usually start from its three elements, namely, 1) network topology; 2) Transfer function; 3) Learning algorithm.The characteristics of each element add up to determine the functional characteristics of the ANN. Therefore, we also start from these three elements of BP network research.

3.1 Topology structure of BP networkAs we said last time, BP networks are actually multilayer perceptrons, so their topology is the same as that of multilayer perceptrons. Because the single hidden layer (three layer) perceptron has been able to solve simple nonlinear problems, it is the most common application. The topology of the three-layer perceptron is shown in the figure below. A simple three-layer BP: 3.2 Transfer function of BP networkThe transfer function adopted by BP network is nonlinear transformation function — Sigmoid function (also known as S function). Its characteristic is that the function itself and its derivative are continuous, so it is very convenient to deal with. Why this function is chosen will be further explained later when we introduce the learning algorithm of BP networks. The unipolar S-type function curve is shown in the figure below.The bipolar S-type function curve is shown in the figure below. 3.3 Learning algorithm of BP networkThe learning algorithm of BP network is BP algorithm, also known as δ algorithm (in the learning process of ANN, we will find many terms with multiple names). Taking three-layer perceptron as an example, when the network output is different from the expected output, there is an output error E, which is defined as follows Next, we will introduce the specific process of BP network learning and training.

Training a BP neural network is actually to adjust the weight and bias of the network. The training process of BP neural network is divided into two parts:

Forward transmission, layer by layer wave transfer output value; Reverse feedback, reverse layer by layer adjustment of weight and bias; Let’s look at forward transmission first. Before training the network, we need to randomly initialize weights and biases, take a random real number of [−1,1] [-1,1][−1,1] for each weight, take a random real number of [0,1][0,1] [0,1] for each bias, Then forward transmission begins.

The training of neural network is completed by multiple iterations, each iteration uses all the records of the training set, while each training network uses only one record, which is described abstractly as follows:

whileTermination conditions not met:for record:dataset:
        trainModel(record)

Copy the code

4.1 Backpropagation 4.2 Training termination conditionsAll the records of the data set are used in each training round, but when to stop the training, there are two stopping conditions: set the maximum number of iterations, for example, stop the training after 100 iterations of the data set, calculate the prediction accuracy of the training set on the network, and stop the training after reaching a certain threshold value

5.1 Network structure There are N nn neurons in the input layer, P PP neurons in the hidden layer, and Q QQ neurons in the output layer.

5.2 Definition of Variables Step 9: Judge the rationality of the model judge whether the network error meets the requirements. When the error reaches the preset precision or the number of learning times is greater than the maximum number designed, the algorithm ends. Otherwise, select the next learning sample and the corresponding output expectation, return the third part, and enter the next round of learning.

In the design of BP network, the number of layers of the network, the number of neurons in each layer, the activation function, the initial value and the learning rate should generally be considered. The following are some selection principles.

6.1 Layer theory of Network has proved that a network with deviation and at least one S-type hidden layer plus one linear output layer can approach any rational function. Increasing the number of layers can further reduce the error and improve the accuracy, but it also complicates the network. Also not to be used only with nonlinear activation function of single-layer network to solve the problem, because you can do it with a single network problems, using adaptive linear network will be solved, and the adaptive linear network speed, for the nonlinear function can only be used to solve the problem, single precision is high enough, also only add layer can achieve the desired results.

6.2 Number of Hidden Layer Neurons Network training accuracy can be improved by adopting a hidden layer and increasing the number of its neurons, which is much simpler than increasing the number of network layers in structural implementation. Generally speaking, we use precision and training network time to measure the design quality of a neural network: (1) When the number of neurons is too small, the network cannot learn well, and the number of training iterations is relatively large, and the training accuracy is not high. (2) When there are too many neurons, the network has more powerful functions and higher accuracy, and the number of training iterations is also larger, which may lead to over fitting. Thus, the principle of selecting the number of hidden layer neurons of neural network is as follows: on the premise of solving the problem, add one or two neurons to accelerate the error reduction speed.

6.3 Selection of Initial Weights Generally, the initial weights are random numbers ranging from −1 to 1. In addition, After analyzing how the two-layer network trains a function, Widelow et al. proposed a strategy of selecting an initial weight of S √r, where R is the number of inputs and S is the number of neurons at the first layer.

6.4 Learning Rate The learning rate ranges from 0.01 to 0.8. A large learning rate may cause system instability, but a small learning rate causes slow convergence and requires a long training time. For a more complex network, different learning rates may be required at different positions of the error surface. In order to reduce the training times and time for finding the learning rate, a more appropriate method is to adopt a variable adaptive learning rate, so that the network can set different learning rates at different stages.

6.5 Selection of Expected Error In the process of network design, an appropriate value of expected error should also be determined after comparative training, which is relative to the number of hidden layer nodes required. In general, two networks with different expected error values can be trained at the same time, and finally one of them can be determined by integrating factors.

7 Limitations of BP network BP network has the following problems: (1) Long training time is required: this is mainly caused by the small learning rate, which can be improved by changing or adaptive learning rate. (2) Completely unable to train: this is mainly reflected in the paralysis of the network. Usually, in order to avoid this situation, one is to choose a smaller initial weight and adopt a smaller learning rate. (3) Local minimum: The gradient descent method adopted here may converge to the local minimum, and better results may be obtained by using multi-layer networks or more neurons.

The main objective of P algorithm is to speed up the training speed and avoid falling into local minimum, etc. Common improvement methods include driving factor algorithm, adaptive learning rate, changing learning rate and action function shrinkage method. The basic idea of momentum factor method is to add a value proportional to the previous weight change to each weight change on the basis of back propagation, and generate a new weight change according to the back propagation method. The adaptive learning rate method is for some specific problems. The principle of the method to change the learning rate is that, in successive iterations, if the reciprocal sign of the objective function is the same for a certain weight, the learning rate of the weight increases; otherwise, if the sign is opposite, the learning rate of the weight decreases. And the shrink-back rule is to shift the function, which is to add a constant.

Three, some source code

%% Clear environment CLC clear % Load data z=data';
n=length(z);
for i=1:6;
    sample(i,:)=z(i:i+n- 6); End % input_train=sample(1:5.1:1400);
    output_train=sample(6.1:1400);
    input_test=sample(1:5.1401:1483);
    output_test=sample(6.1401:1483); % Number of nodes inputnum=5;
hiddennum=3;
outputnum=1; % Net =newff(inputn,outputn,hiddennum); d=22; Dennum + hiddenNum +hiddennum*outputnum); B2=x(inputnum*hiddennum+hiddennum+hiddennum*outputnum+1:inputnum*hiddennum+hiddennum+hiddennum*outputnum+outputnum);

net.iw{1.1}=reshape(w1,hiddennum,inputnum);
net.lw{2.1}=reshape(w2,outputnum,hiddennum);
net.b{1}=reshape(B1,hiddennum,1);
net.b{2}=B2; %% BP network training % network evolution parameter net.trainparam.epochs =100;
net.trainParam.lr=0.1;
%net.trainParam.goal=0.00001; % net [net,per2]= net (net,inputn,outputn); %% BP network prediction % data normalization inputn_test=mapminmax('apply',input_test,inputps);
an=sim(net,inputn_test);
test_simu=mapminmax('reverse',an,outputps);
error=test_simu-output_test;
E=mean(abs(error./output_test))
plot(output_test,'b*')
hold on;
plot(test_simu,'-o')
title('results'.'fontsize'.12)
legend('Actual value'.'Predicted value')
xlabel('time')
ylabel('compare')
%% Cost or Objective function 
function [nbest,fbest,NumEval]=ffa_mincon(u0,Lb,Ub,para,inputnum,hiddennum,outputnum,net,inputn,outputn) % para=[20 500 0.5 0.2 1]; 
% Check input parameters (otherwise set as default values)
if nargin<5, para= [20 50 0.25 0.20 1]; end  
if nargin<4, Ub=[]; end
if nargin<3, Lb=[]; end
if nargin<2,
disp('Usuage: FA_mincon(@cost,u0,Lb,Ub,para)');
end
% n=number of fireflies
% MaxGeneration=number of pseudo time steps
% ------------------------------------------------
% alpha=0.25;      % Randomness 0-- 1 (highly random)
% betamn=0.20;     % minimum value of beta
% gamma=1;         % Absorption coefficient
% ------------------------------------------------
n=para(1);  
MaxGeneration=para(2);  %MaxGeneration
alpha=para(3); 
betamin=para(4); 
gamma=para(5);        
NumEval=n*MaxGeneration;
% Check if the upper bound & lower bound are the same size
if length(Lb) ~=length(Ub),
    disp('Simple bounds/limits are improper! ');
    return
end

% Calcualte dimension        
d=length(u0); %
% Initial values of an array       
zn=ones(n,1) *10^100;
% ------------------------------------------------
% generating the initial locations of n fireflies       
[ns,Lightn]=init_ffa(n,d,Lb,Ub,u0);  % 
% Iterations or pseudo time marching
for k=1:MaxGeneration,     %%%%% start iterations     

% This line of reducing alpha is optional
 alpha=alpha_new(alpha,MaxGeneration);

% Evaluate new solutions (for all n fireflies)       
for i=1:n, zn(i)=fun(ns(i,:),inputnum,hiddennum,outputnum,net,inputn,outputn); Lightn(i)=zn(i); End the function error = fun (x, inputnum, hiddennum outputnum,.net, inputn, outputn) % this function is used to calculate x input individual fitness value % % inputnum input Number of nodes in input layer %outputnum Input Number of nodes in hidden layer % NET Input network % Inputn input training input data %outputn input training output data %error Output Individual fitness value % extraction W1 =x(1:inputnum*hiddennum); % ten weights B1=x(inputnum*hiddennum+1:inputnum*hiddennum+hiddennum); % Five thresholds w2=x(inputnum* hiddenNum + hiddenNum +1:inputnum*hiddennum+hiddennum+hiddennum*outputnum); % five weights B2=x(inputnum*hiddennum+hiddennum+hiddennum*outputnum+1:inputnum*hiddennum+hiddennum+hiddennum*outputnum+outputnum); % A threshold % Network evolution parameter net.trainparam. epochs=20;
net.trainParam.lr=0.1;
net.trainParam.goal=0.00001;
net.trainParam.show=100;
net.trainParam.showWindow=false; % network weight assignment net.iw{1.1}=reshape(w1,hiddennum,inputnum);
net.lw{2.1}=reshape(w2,outputnum,hiddennum);
net.b{1}=reshape(B1,hiddennum,1);
net.b{2}=B2; % should be net is a structure, then the first net.iw {1.1} is the weight of the first input to the hidden layer, % is the first of these1In contrast to this hidden layer is the second line of code: net.iw {2.1} is the weight of the input vector from the first hidden layer to the output layer, % is shown here2Represents the output layer. After sorting this out, it becomes obvious: to the right of the first assignment is to transform the first W1 matrix into the number of hidden layers * the number of input layers. The second layer is from the hidden layer to the output layer, where W1, W2 are the weight matrix. The last line is for the first layer (1Represents the threshold assignment of). Net =train(net,inputn,outputn); an=sim(net,inputn); error=sum(abs(an-outputn));
Copy the code

4. Operation results

Matlab version and references

1 matlab version 2014A

[1] Yang Baoyang, YU Jizhou, Yang Shan. Intelligent Optimization Algorithm and Its MATLAB Example (2nd Edition) [M]. Publishing House of Electronics Industry, 2016. [2] ZHANG Yan, WU Shuigen. MATLAB Optimization Algorithm source code [M]. Tsinghua University Press, 2017. [3] Swarm Intelligence Optimization Algorithm of Firefly Algorithm (FA)