Perceptron (Perceptron)

  • The input is the feature vector of the instance, and the output is the category of the instance, taking +1 and -1.
  • Perceptron corresponds to the separated hyperplane which divides the instance into positive and negative categories in the input space and belongs to the discriminant model.
  • Importing loss function based on misclassification;
  • Gradient descent method is used to minimize the loss function.
  • Perceptron learning algorithm is simple and easy to implement, which can be divided into primitive form and dual form.
  • Proposed by Rosenblatt in 1957, is the basis of neural networks and support vector machines.

Perceptron model

Definition:

  • Given that the input space (the characteristic space) is X ⊆Rn\subseteq R^n⊆Rn, the output space is Y={+1, -1}.
  • The input x ∈\in∈ X represents the eigenvectors of the instance, corresponding to the points of the input space (the eigenspace), and the output Y ∈\in∈ Y represents the class of the instance, and the function from the input space to the output space:


f ( x ) = s i g n ( w x + b ) f(x) = sign(w*x + b)

It’s called the perceptron

  • Model parameters: W x, inner product, weight vector, bias,

Function: sign (x) = {+ 1 x > = 0-1 x < 0 sign (x) = \ begin {cases} + 1 \ qquad x > = 0 \ \ 1 \ qquad x < 0 \ end signs (x) = {cases} {+ 1 x > = 0-1 x < 0

Perceptron geometry interpretation:

W ∗x+b= 0W *x +b=0w ∗x+b=0

Corresponding to the hyperplane S, w is the normal vector, b intercept, separating positive and negative classes:

Perceptron learning strategies

How to define the loss function?

Natural selection: the number of misclassification points, but the loss function is not W,b continuous differentiable, not suitable for optimization.

Alternative: Total distance from misclassification point to hyperplane:

Distance: 1 ∣ ∣ w ∣ ∣ ∣ w ∗ x + b ∣ > 0 \ frac {1} {| | w | |} | | w * x + b > 0 ∣ ∣ w ∣ ∣ ∣ w ∗ 1 x + b ∣ > 0

− Yi (w∗x+b)-y_i (w*x +b) − YI (w∗x+b)

Misclassification point distance: – 1 ∣ ∣ w ∣ ∣ ∣ w ∗ x + b ∣ > 0 \ frac {1} {| | w | |} | | w * x + b > 0 – ∣ ∣ w ∣ ∣ ∣ w ∗ 1 x + b ∣ > 0

The total distance:


1 w x M y i ( w x + b ) > 0 -\frac {1} {||w||} \sum_{x\in M} -y_i (w*x + b) > 0

Loss function:


L o s s ( w . b ) = x M y i ( w x + b ) Loss(w, b) = -\sum_{x\in M} y_i (w*x + b)

Perceptron learning algorithm

Min(w, b) = -\sum_{x\in M} y_i (w*x + b)

Stochastic gradient descent,

First, arbitrarily select a hyperplane w and B, and then continuously minimize the gradient of the objective function, the loss function L:

Select error classification points to update: