What is a sklearn

1/ Get data

   # Get small-scale data
   from sklearn.datasets import load_iris 
   iris_object = load_iris() # Get small data

   After obtaining the data, you can view some properties of the data, such as:
   iris_object.data  [],[],[],[],[],[]
   iris_object.data.shape # rows and columns of feature data

   iris_object.target [x,x,x,......]

   iris_object.DESCR # Data description
   iris_object.feature_names # Feature name
   iris_object.target_names  The name of the tag
   # Get large-scale data
   from sklearn.datasets import fetch_20newsgroups
   news = fetch_20newsgroups() # Get large-scale data
2/ Data processing

   # e.g.
   from sklearn.model_selection import train_test_split
   x_train,x_test,y_train,y_test = train_test_split(iris_object.data,
3/ Feature engineering

1/ Feature extraction

① Dictionary feature extraction:

② Text feature extraction

2/ Feature preprocessing

(1) the normalized

② Standardization:

3/ Feature dimension reduction:

① Variance filtering dimension reduction:

(2) Correlation coefficient filtering dimension reduction:

③ Principal Component Analysis (PCA)

4/ Model training (design model)

<1> Classification algorithm:

1 the KNN algorithm

② Grid search and cross validation

③ Naive Bayes algorithm

④ Decision tree:

⑤ Random forest:

<2> Regression algorithm:

① Linear regression

(2) the ridge regression

③ Logistic regression

5/ Model evaluation (ROC curve and AUC indicators) :

6/ Model save and load: sklearn.externals.joblib

7) Model application