A list,

LSTM LSTM network

Long Short Term Memory, or LSTM as we call it, is specially designed to solve long-standing problems. All RNS have a chain form of repeating neural network modules. In standard RNN, this repeating structural module has only a very simple structure, such as a TANH layer.



Figure 3. RNNcell

The LSTM has the same structure, but the repeated modules have a different structure. Instead of a single neural network layer, there are four, interacting in a very specific way



Figure 4. LSTMcell

Core ideas of LSTM

The key to LSTM is the state of the cell as a whole (the green diagram represents a cell), and the horizontal line across the cell.

The cellular state is like a conveyor belt. It runs directly along the chain, with just a few linear interactions. It would be easy for the message to circulate and stay the same.



Figure 5.LSTMcell internal structure diagram

There is no way to add or remove information with only the top horizontal line. They do it through a structure called gates.

Gates can selectively allow information to pass through, mainly through a neural layer of sigmoid and a point-by-point multiplication operation.



Figure 6. Information node

Each element of the sigmoID layer output (which is a vector) is a real number between 0 and 1, representing the weight (or ratio) to let the corresponding message through. For example, 0 means “let no information pass” and 1 means “let all information pass”.

LSTM realizes information protection and control through three basic structures. These three gates are input gate, forget gate and output gate.

In-depth understanding of LSTM

Forget the door

The first step in our LSTM is to decide what information we will discard from the cellular state. This decision is made through a layer called the forgetgate. The gate reads ht−1 H_ {t−1} HT −1 and X T X_txt, and outputs a value between 0 and 1 for each number in the cell state C T −1 C_{T-1}Ct−1. 1 indicates completely reserved, 0 indicates completely discarded.



Ht −1 represents the output of the previous cell, and XT represents the input of the current cell. σσ represents the sigmod function.

Enter the door

The next step is to decide how much new information to add to the cell state. Doing this involves two steps: first, a Sigmoid layer called the input Gate Layer determines which information needs to be updated; A TANH layer generates a vector, which is the alternative to update, C^ T. In the next step, we join these two parts together to make an update to the cell’s state.



It is now time to update the old cell state. Ct−1 is updated to Ct. The previous steps have determined what will be done, and we are now going to actually do it.

We multiply the old state times ft, discarding the information we know we need to discard. And then I’m going to add it∗C~t. This is the new candidate value, varying according to how much we decide to update each state.

Output the door

Finally, we need to decide what value to output. This output will be based on our cell state, but also a filtered version. First, we run a sigmoID layer to determine which part of the cell state will be output. Next, we process the cell state through TANH (to get a value between -1 and 1) and multiply it by the output of the Sigmoid gate. Finally, we will only output what we are sure to output.

Ii. Source code

%% CLC clear all close all % load data, refactor to row vector num=100;
x=1:num;
db=0.1;
data =abs(0.5. *sin(x)+0.5. *cos(x)+db*rand(1,num)); data1 =data; % Assign your load data to the data variable. %data is the row vector. If you don't get it, leave a message. %% % before the sequence90% for training, after10% tests numTimeStepsTrain =floor(0.9*numel(data));
dataTrain = data(1:numTimeStepsTrain+1);
dataTest = data1(numTimeStepsTrain+1:end); % data preprocessing, training data were normalized to have zero mean and unit variance. mu = mean(dataTrain); sig =std(dataTrain); dataTrainStandardized = dataTrain; % Enter the LSTM time series to alternate a time step XTrain = dataTrainStandardized(1:end- 1);
YTrain = dataTrainStandardized(2:end); %% % creates an LSTM regression network, specifying the number of hidden cells in the LSTM layer96*3% sequence prediction, therefore, input one-dimensional, output one-dimensional numFeatures =1;
numResponses = 1;
numHiddenUnits = 20*3;
 
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits)
    fullyConnectedLayer(numResponses)
    regressionLayer];
%% WOA
lb=0.001; % learning rate lower limit UB =0.1; % Maximum learning rate % Main loopwhileT <Max_iter t end % Compares predicted values with test data. figure(1)
subplot(2.1.1)
plot(YTest,'gs-'.'LineWidth'.2)
hold on
plot(YPred_best,'ro-'.'LineWidth'.2)
hold off
legend('Observed value'.'Predicted value')
xlabel('time')
ylabel('Data value')
title('Forecast with Updates')
 
subplot(2.1.2)
stem(YPred_best - YTest)
xlabel('time')
ylabel('mean square deviation')
title('mean square deviation chart' )
 
 
figure(2)
plot(dataTrain(1:end- 1))
hold on
idx = numTimeStepsTrain:(numTimeStepsTrain+numTimeStepsTest);
plot(idx,[data(numTimeStepsTrain) YPred_best],'-')
hold off
xlabel('time')
ylabel('Data value')
title('Forecast chart')
legend('Observed value'.'Predicted value')
 
figure(3)
plot(1:Max_iter,Convergence_curve,'bo-');
hold on;
title('Error-cost curve after whale optimization');
xlabel('Number of iterations')
ylabel('Error fitness value')
Copy the code

3. Operation results



Fourth, note

Version: 2014 a