What is a perceptron

Perceptron (Perceptron)

The input is the feature vector of the instance, and the output is the category of the instance, taking +1 and -1.
Perceptron corresponds to the separated hyperplane which divides the instance into positive and negative categories in the input space and belongs to the discriminant model.
Importing loss function based on misclassification;
Gradient descent method is used to minimize the loss function.
Perceptron learning algorithm is simple and easy to implement, which can be divided into primitive form and dual form.
Proposed by Rosenblatt in 1957, is the basis of neural networks and support vector machines.

Perceptron model

Definition:

Given that the input space (the characteristic space) is X ⊆Rn\subseteq R^n⊆Rn, the output space is Y={+1, -1}.
The input x ∈\in∈ X represents the eigenvectors of the instance, corresponding to the points of the input space (the eigenspace), and the output Y ∈\in∈ Y represents the class of the instance, and the function from the input space to the output space:

$f(x) = sign(w*x + b)$

It’s called the perceptron

Model parameters: W x, inner product, weight vector, bias,

Function: sign (x) = {+ 1 x > = 0-1 x < 0 sign (x) = \ begin {cases} + 1 \ qquad x > = 0 \ \ 1 \ qquad x < 0 \ end signs (x) = {cases} {+ 1 x > = 0-1 x < 0

Perceptron geometry interpretation:

W ∗x+b= 0W *x +b=0w ∗x+b=0

Corresponding to the hyperplane S, w is the normal vector, b intercept, separating positive and negative classes:

Perceptron learning strategies

How to define the loss function?

Natural selection: the number of misclassification points, but the loss function is not W,b continuous differentiable, not suitable for optimization.

Alternative: Total distance from misclassification point to hyperplane:

Distance: 1 ∣ ∣ w ∣ ∣ ∣ w ∗ x + b ∣ > 0 \ frac {1} {| | w | |} | | w * x + b > 0 ∣ ∣ w ∣ ∣ ∣ w ∗ 1 x + b ∣ > 0

− Yi (w∗x+b)-y_i (w*x +b) − YI (w∗x+b)

Misclassification point distance: – 1 ∣ ∣ w ∣ ∣ ∣ w ∗ x + b ∣ > 0 \ frac {1} {| | w | |} | | w * x + b > 0 – ∣ ∣ w ∣ ∣ ∣ w ∗ 1 x + b ∣ > 0

The total distance:

-\frac {1} {||w||} \sum_{x\in M} -y_i (w*x + b) > 0

Loss function:

Loss(w, b) = -\sum_{x\in M} y_i (w*x + b)

Perceptron learning algorithm

Min(w, b) = -\sum_{x\in M} y_i (w*x + b)

Stochastic gradient descent,

First, arbitrarily select a hyperplane w and B, and then continuously minimize the gradient of the objective function, the loss function L:

Select error classification points to update:

Related Posts

Optimization of site selection based on Matlab Ant colony algorithm for electric vehicle charging station and changing station location optimization problem

Computer Vision – Introduction to TensorFlow (5)

How to become a data analyst quickly?