“This is the fifth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

perceptron

Today I’m going to talk about an older linear classification model, but it’s also the basis of neural networks. Neural network can see the accumulation of a large number of perceptrons. Adding nonlinear activation functions to the linear model of perceptrons and structuring these simple classifiers can form neural network. In fact, neural network can be seen as an application of swarm intelligence from one side.

Linear and nonlinear problems

Linear separability describes two kinds of data sets. For two dimensional data set (two characteristics), if a straight line, can perfect to distinguish the two classification, then the data set is linearly separable, so for high-dimensional data if find a hyperplane can replace linear separate the data of high dimension space, then said that the data are linearly separable.

In the figure above, A represents linearly separable and B represents linearly indivisible

The reason why I want to talk about linearly separable is because perceptrons can only solve linearly separable problems.

data

There are samples like this


D = { ( x 1 . y 1 ) . ( x 2 . y 2 ) . . ( x i . y i ) } N D = \{(x_1, y_1),(x_2, y_2),\cdots,(x_i, y_i) \}_N

Where XI ∈Rmx_i \in \mathbb{R}^mxi∈Rm data is an M-dimensional vector, and yi∈−1,+1y_i \in {-1,+1}yi∈−1,+1 is a 2-category, -1 and +1 respectively represent a category. And the whole reason why I chose minus 1 and plus 1 is because it’s easy to calculate.

Perceptron model

First of all, the perceptron is an error-driven model, and the model is simply to find a partition line or partition hyperplane like wTxw^TxwTx. Where W ∈Rmw\in \mathbb{R}^ MW ∈Rm is also a vector, that is, to find a line or plane that can divide two types of data.

Perceptron objective function


L ( w ) = x i D y i w T x i L(w) = \sum_{x_i \in D} -y_iw^Tx_i

First, we randomly initialize w, and then update the parameter of each point (xi,yi)(x_i,y_i)(xi,yi) if the following conditions are met, that is, we update w if the two conditions are wrong


w T x i > 0 y i = 1 w^Tx_i > 0 \\ y_i =-1

This is the update, which is when wTxi>0w^Tx_i >0 wTxi>0 normally yiy_iyi should be +1 which is above the partition hyperplane. That means w can’t classify the sample correctly.


w n e w = w o l d eta x i w_{new} = w_{old} – \eta x_i

( w x i ) T x i w T x x i 2 (w-x_i)^Tx_i\\ w^Tx – ||x_i||^2

And segmentation hyperplane is superimposed direction the ∣ ∣ xi ∣ ∣ 2 | | x_i | | ^ 2 ∣ ∣ xi ∣ ∣ 2

The second case I won’t explain too much here


w T x i < 0 y i = + 1 w^Tx_i < 0 \\ y_i =+1

This is on the update


w n e w = w o l d + eta x i w_{new} = w_{old} + \eta x_i

Until all of the sample points are correctly classified, and you exit the loop, the perceptron’s idea is pretty simple. This algorithm does not look at all the samples globally, but looks at each sample one by one. In fact, we usually learn this way to learn.