Linearly separable support vector machines and nonlinear support vector machines
Linearity can be divided into hard interval maximization and soft interval maximization (regularization)





  1. Hard spacing maximization


      1. Duality by Lagrange




      2. KKT
        1. For inequality constraints
        2. There are three terms, the partial derivatives of the Lagrangian with respect to each variable, the Lagrangian coefficient times the inequality constraint is equal to zero, the Lagrangian coefficient is greater than or equal to zero
  2. Soft interval maximization


    1. Hinge function


  3. Kernel function


    1. The data is separable in the feature space by mapping linearly indivisible data in the input space to a high-latitude feature space
    2. A sufficient and necessary condition that the kernel function K is a positive definite kernel
      1. Symmetric and Gram matrix is semidefinite
      2. If the above conditions are satisfied, an inner product closed Hilbert space can be formed, and a suitable mapping φ (x) must be found.
    3. species
      1. K (x,xi)=x⋅xi
        1. That is, kernelless, linearly separable scenes. You can try one of these first
      2. Polynomial kernel function
        1. Polynomial kernel function can realize the mapping of low-dimensional input space to high-latitude feature space, but the polynomial kernel function has many parameters. When the polynomial order is relatively high, the element value of kernel matrix tends to be infinite or infinitesimal, and the computational complexity will be too large to calculate.
      3. Gaussian (RBF) kernel function
        1. Gaussian radial basis function is a kernel function with strong locality, which can map a sample to a higher dimensional space. It’s usually used here
        2. Kappa (x, xi) predominate = exp (- | | x – xi | | 2 delta 2)
      4. Sigmoid kernel function
        1. Kappa (x, xi) = predominate tanh (eta < x, xi > + theta)
        2. Using sigmoID kernel function, support vector machine is a kind of multilayer neural network.
    4. Usage scenarios
      1. Assuming the sample size stays the same, pretty much the same
        1. If the number of features is large (similar to the number of samples), LR or SVM with linear kernel should be selected.
        2. If the number of features is small (the number of samples is normal), SVM+ Gaussian kernel function is used.
        3. If the number of features is small and the sample size is sufficient, you need to manually add some features to make the first case