An intuitive introduction to RBF

RBF specific principle, a lot of articles on the network must be better than me, so I don’t waste my breath, here just say some intuitive understanding of RBF network

1 RBF is a two-layer network

Yes, RBF is structurally not complicated, there are only two layers: hidden layer and output layer. Its model can be expressed mathematically as:

Y j = ∑ I = 1 n w I j ϕ (∥ x − u I ∥ 2), (j = 1,… , p) y_j = \ sum_ {I = 1} ^ n w_ {ij} \ phi (\ Vert X-ray u_i \ Vert ^ 2), (j = 1, \ dots, p) yj = I = 1 ∑ nwij ϕ (∥ x – UI ∥ 2), (j = 1,… ,p)



The hidden layer of 2 RBF is a nonlinear mapping

The commonly used activation function of RBF hidden layer is gaussian function: ϕ (∥ x – u ∥) = e – ∥ x – u ∥ sigma 2 \ phi (\ Vert x – u \ Vert) = e ^ {- \ frac {\ Vert X – u \ Vert ^ 2} {\ sigma ^ 2}} ϕ (∥ x – u ∥) = e – sigma ∥ x – u ∥ 2

3 the RBF output layer is linear

4 The basic idea of RBF is to transform data into high-dimensional space and make it linearly separable in high-dimensional space

RBF hidden layer transforms data into a high-dimensional space (usually a high-dimensional space), and considers that there exists a high-dimensional space in which data can be linearly separable. So the output layer is linear. It’s the same idea as the nuclear method. Here is an example from the teacher’s PPT:

In the example above, the original data is converted into another two-dimensional space using the Gaussian function. In this space, the XOR problem is solved. As you can see, the transformed space is not necessarily higher dimensional than the original.

RBF learning algorithm

For the RBF network in the figure above, the unknown quantities are as follows: center vector u iu_iUI, the constant σ \sigmaσ in the Gaussian function, and the weight of the output layer W WW.

The whole process of learning algorithm is roughly shown as follows:

It can be described as follows:

  1. Use Kmeans algorithm to find center vector u iu_iUI
  2. KNN (K nearest neighbor)rule is used to calculate σ \sigmaσ I =1 K ∑ K =1 K ∥ u K − U I ∥ 2 \sigma_i = SQRT {frac{1}{K}\sum_{K =1}^K \ Vert u_k – u_i \ Vert ^ 2} sigma I = ∑ K1k = 1 K ∥ UK – UI ∥ 2
  3. WW W can be found using the least square method

Lazy RBF

You can see that the old RBF is very troublesome, both kmeans and KNN. Later, some people proposed lazy RBF, which means that every data in the training set is regarded as the center vector without kmeans. So, the nuclear matrix Φ \ Phi Φ is a phalanx, and as long as the guarantee in the training data is different, the nuclear matrix Φ \ Phi Φ is reversible. The disadvantage of this method is that if the training set is large, the kernel matrix φ \Phi Phi is also large, and the number of training sets must be larger than the dimension of each training data.

MATLAB RBF neural network

The RBF implemented below has only one output for your reference. For multiple outputs, it’s actually quite simple, just W, WW becomes multiple, so I’m not going to do that here.

Demo. m conducts RBF training and prediction on XOR data, showing the whole process. The last few lines of code use encapsulation for training and prediction.

% RBF prediction of chaotic time series (one-step prediction) -- CLC clear all close all % -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- % % generating chaotic sequence dx/dt = sigma * (y, x) % dy/dt = r * x  - y - x*z % dz/dt = -b*z + x*y sigma = 16; % Lorenz equation parameter A b = 4; % b r = 45.92; % c y = [-1,0,1]; % starting point (1 x 3 row vector) h = 0.01; % integral time step k1 = 30000; % number of iterations k2 = 5000; Z = LorenzData(y,h,k1+k2,sigma,r,b); Z = LorenzData(y,h,k1+k2,sigma,r,b); X = Z(k1+1:end,1); % time series X = normalize_a(X,1); % signal is normalized to the mean value is 0, the amplitude of 1% -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- % related parameter t = 1; % delay d = 3; % embedded dimension n_tr = 1000; % Training sample n_te = 1000; % % test sample -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- % phase space reconstruction X_TR = X (1: n_tr); X_TE = X(n_tr+1:n_tr+n_te); figure,plot(1:1:n_tr,X_TR,'r'); hold on plot(n_tr+1:1:n_tr+n_te,X_TE,'b'); hold off [XN_TR,DN_TR] = PhaSpaRecon(X_TR,t,d); [XN_TE,DN_TE] = PhaSpaRecon(X_TE,t,d); % -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - % training and test P = XN_TR; T = DN_TR; spread = 1; Net = newrbe(P,T,spread); net = newrbe(P,T,spread); ERR1 = sim(net,XN_TR)-DN_TR; err_mse1 = mean(ERR1.^2); perr1 = err_mse1/var(X) DN_PR = sim(net,XN_TE); ERR2 = DN_PR-DN_TE; err_mse2 = mean(ERR2.^2); Perr2 err_mse2 / var (X) = % -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- % results figure figure; subplot(211); plot(1:length(ERR2),DN_TE,'r+-',1:length(ERR2),DN_PR,'b-'); Title (' True value (+) and predicted value (.) ') subplot(212); plot(ERR2,'k'); Title (' Absolute error of prediction ')Copy the code

Complete code or write to add QQ1575304183