1. Linear Regression

The dependent variable is continuous, and the independent variable can be continuous or discrete. Regression is linear in nature. Linear regression establishes the relationship between the dependent variable (Y) and one or more independent variables (X) by using the best fitting line (also known as the regression line).

The difference between unary linear regression and multivariate linear regression is that multivariate linear regression has more than one independent variable, while unary linear regression has only one independent variable.

(1). Unary linear regression

For example, to predict house price based on house area alone, the regression equation is as follows:

The key to determine θ0 and θ1 lies in how to measure the difference between hθ(xi) and YI. We use the mean square error to represent the difference, and the final optimization goal is to find the values of θ0 and θ1 that minimize the mean square error. In this case, the mean square error is also called cost function.

Gradient descent can be used to determine the θ value that minimizes cost function.

(2) Multiple linear regression

The polynomial regression also needs feature scaling

2. Logistic Regression

Derivation process:

Code implementation:

import numpy as np def sigmoid(z): ''' Input: z: is the input (can be a scalar or an array) Output: h: the sigmoid of z ''' # calculate the sigmoid of z h = 1 / (1 + np.exp(-z)) return h def gradientDescent(x, y, theta, alpha, num_iters): ''' Input: x: matrix of features which is (m,n+1) y: Corresponding labels of the input matrix X, Dimensions (M,1) Theta: weight vector of Dimension (n+1,1) alpha: learning rate num_iters: number of iterations you want to train your model for Output: J: the final cost theta: your final weight vector Hint: you might want to print the cost to make sure that it is going down. ''' # get 'm', the number of rows in matrix x m = x.shape[0] for i in range(0, num_iters): # get z, the dot product of x and theta z = np.dot(x,theta) # get the sigmoid of h h = sigmoid(z) # calculate the cost function J  = -1./m * (np.dot(y.transpose(), np.log(h)) + np.dot((1-y).transpose(),np.log(1-h))) # update the weights theta theta = theta - (alpha/m) * np.dot(x.transpose(),(h-y)) J = float(J) return J, thetaCopy the code
# X input is 10 x 3 with ones for the bias terms tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000, Axis =1) # Y Labels are 10 x 1 tmp_Y = (np.random.rand(10, 1) > 0.35). Astype (float) # Apply gradient descent tmp_J, tmp_theta = gradientDescent(tmp_X, tmp_Y, np.zeros((3, 1)), 1e-8, 700) print(f"The cost after training is {tmp_J:.8f}.") print(f"The resulting vector of weights is {[round(t, 8) for t in np.squeeze(tmp_theta)]}")Copy the code

3. Multiclass classification

4. Regularization

5. Feature engineering

(1). Discretization of continuous features

Rarely directly in the industry, the characteristics of the continuous value as a logistic regression model input, but the continuous features discrete into a series of 0, 1 to logistic regression models, the advantages are the following (www.jianshu.com/p/7445a7b94…

A). It is easy to increase and decrease discrete features and facilitate rapid iteration of the model.

B). The discretized features have strong robustness to abnormal data. In LR model, A will correspond to A weight W, if discretized, then A will expand into features A-1, A-2, A-3… , each feature corresponds to A weight. If feature A-4 does not appear in the training sample, then the training model has no weight for A-4. If feature A-4 appears in the test sample, the feature A-4 will not be effective. Equivalent to invalid. However, if continuous features are used, in the LR model, y = W * A, where A is the feature and w is the corresponding weight of A, for example, a represents age, then the value range of A is [0..100]. If a test case appears in the test sample, the value of A is 300, obviously a is an outlier, but w * A still has a value. And they’re very large, so outliers can have a very big effect on the final result.

C). Feature crossover can be carried out after discretization, and feature A can be discretized into M values, and feature B can be discretized into N values, so there will be M * N variables after crossover, which further introduces nonlinearity and improves expression ability.

Whether the model uses discrete features or continuous features is actually a tradeoff between “a large number of discrete features + simple model” and “a small number of continuous features + complex model”. It can be discretized using linear models, or continuous features plus deep learning.

(2) Feature engineering selection

Reference: www.zhihu.com/question/29…