Scikit - learn the foundation - Moment For Technology

1 Scikit – learn is introduced
- 1.1 classification
- 1.2 Return to the
- 1.3 clustering
- 1.4 Dimension reduction
- 1.5 Model selection
- 1.6 pretreatment
2 Scikit-learn machine learning steps
- 2.1 Import common libraries
- 2.2 Load the data
- 2.3 Divide training set and test set
- 2.4 Data preprocessing
- 2.5 standardized
  - 2.5.1 The normalized
  - 2.5.2 binarization
  - 2.5.3 Coding classification feature
  - 2.5.4 Input missing value
  - 2.5.5 Generating polynomial feature
- 2.6 Create a model estimator
  - 2.6.1 Supervised learning
  - 2.6.2 Unsupervised learning
- 2.7 Fitting the data
  - 2.7.1 Supervised learning
  - 2.7.2 Unsupervised learning
- 2.8 To predict
  - 2.8.1 Supervised learning
  - 2.8.2 Unsupervised learning
- 2.9 Evaluating model performance
  - 2.9.1 Classification indexes
  - 2.9.2 Return to the index
  - 2.9.3 Cluster indicators
  - 2.9.4 Cross validation
- 2.10 Model to adjust
  - 2.10.1 The grid search
  - 2.10.2 Stochastic parameter optimization

Scikit – learn is introduced

Scikit-learn is an open source Python library that implements machine learning, preprocessing, cross-validation, and visualization algorithms through a unified interface.

Scikit-learn: scikit-learn.org

Machine learning in Python

Simple and effective data mining and data analysis tools
It is accessible to all and can be reused in a variety of environments
Build based on NumPy, SciPy and Matplotlib
Open source, commercially available – BSD license

classification

Determine which category the object belongs to.

Applications: spam detection, image recognition.

Algorithms: SVM, nearest neighbor, random forest,……

Return to the

Predicts contiguous value properties associated with an object.

Applications: drug reactions, stock prices.

Algorithms: SVR, Ridge regression, lasso,……

clustering

Automatically group similar objects into collections.

Application: customer segmentation, grouping experimental results

Algorithm: K-means, spectral clustering, mean shift,……

Dimension reduction

Reduce the number of random variables to consider.

Application: visualization, improve efficiency

Algorithm: PCA, feature selection, nonnegative matrix factorization.

Model selection

Compare, validate, and select parameters and models.

Objective: To improve accuracy by adjusting parameters

Modules: grid search, cross validation, indicators.

pretreatment

Feature extraction and normalization.

Applications: Transform input data (such as text) for use with machine learning algorithms. Module: preprocessing, feature extraction.

Scikit-learn machine learning steps

# import sklearn
from sklearn import neighbors, datasets, preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# load data
iris = datasets.load_iris()

# Divide training set and test set
X, y = iris.data[:, :2], iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33)

# Data preprocessing
scaler = preprocessing.StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# create model
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
# Model fitting
knn.fit(X_train, y_train)

# prediction
y_pred = knn.predict(X_test)
# assessment
accuracy_score(y_test, y_pred)
Copy the code

Import common libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Copy the code

Load the data

Scikit-learn handles data stored as NumPy arrays or SciPy sparse matrices. It also supports other data types that can be converted to numeric arrays, such as the Pandas data box.

X = np.random.random((11.5))
y = np.array(['M'.'M'.'F'.'F'.'M'.'F'.'M'.'M'.'F'.'F'.'F'])
X[X < 0.7] = 0
Copy the code

Divide training set and test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
Copy the code

Data preprocessing

standardized

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
standardized_X = scaler.transform(X_train)
standardized_X_test = scaler.transform(X_test)
Copy the code

The normalized

from sklearn.preprocessing import Normalizer
scaler = Normalizer().fit(X_train)
normalized_X = scaler.transform(X_train)
normalized_X_test = scaler.transform(X_test)
Copy the code

binarization

from sklearn.preprocessing import Binarizer
binarizer = Binarizer(threshold=0.0).fit(X)
binary_X = binarizer.transform(X)
Copy the code

Coding classification feature

from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
y = enc.fit_transform(y)
Copy the code

Input missing value

from sklearn.preprocessing import Imputer
imp = Imputer(missing_values=0, strategy='mean', axis=0)
imp.fit_transform(X_train)
Copy the code

Generating polynomial feature

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(5)
poly.fit_transform(X)
Copy the code

Create a model estimator

Supervised learning

# Linear regression
from sklearn.linear_model import LinearRegression
lr = LinearRegression(normalize=True)
Support vector Machine (SVM)
from sklearn.svm import SVC
svc = SVC(kernel='linear')
# Naive Bayes
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
# KNN
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
Copy the code

Unsupervised learning

Principal Component Analysis (PCA)
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)
# K Means
k_means = KMeans(n_clusters=3, random_state=0)
Copy the code

Fitting the data

Supervised learning

lr.fit(X, y)
knn.fit(X_train, y_train)
svc.fit(X_train, y_train)
Copy the code

Unsupervised learning

k_means.fit(X_train)
pca_model = pca.fit_transform(X_train)
Copy the code

To predict

Supervised learning

# Prediction tag
y_pred = svc.predict(np.random.random((2.5)))
# Prediction tag
y_pred = lr.predict(X_test)
# Evaluate tag probabilities
y_pred = knn.predict_proba(X_test)
Copy the code

Unsupervised learning

y_pred = k_means.predict(X_test)
Copy the code

Evaluating model performance

Classification indexes

# accuracy
knn.score(X_test, y_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
# Classification prediction evaluation function
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
# confusion matrix
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_pred))
Copy the code

Return to the index

# Mean absolute error
from sklearn.metrics import mean_absolute_error
y_true = [3.0.5.2]
mean_absolute_error(y_true, y_pred)
# mean square error
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred)
# R2 score
from sklearn.metrics import r2_score
r2_score(y_true, y_pred)
Copy the code

Cluster indicators

Adjust the Rand coefficient
from sklearn.metrics import adjusted_rand_score
adjusted_rand_score(y_true, y_pred)
# homogeneity
from sklearn.metrics import homogeneity_score
homogeneity_score(y_true, y_pred)
# V-measure
from sklearn.metrics import v_measure_score
metrics.v_measure_score(y_true, y_pred)
Copy the code

Cross validation

from sklearn.cross_validation import cross_val_score
print(cross_val_score(knn, X_train, y_train, cv=4))
print(cross_val_score(lr, X, y, cv=2))
Copy the code

Model to adjust

The grid search

from sklearn.grid search import GridSearchcV
params = {"n neighbors": np.arange(1.3),"metric": ["euclidean"."cityblock"]}
grid = GridSearchCV(estimator=knn,
                    param_grid-params)
grid.fit(X_train, y_train)
print(grid.best score)
print(grid.best_estimator_.n_neighbors)
Copy the code

Stochastic parameter optimization

from sklearn.grid_search import RandomizedSearchCV
params = {"n_neighbors": range(1.5),
          "weights": ["uniform"."distance"]}
rsearch = RandomizedSearchCV(estimator=knn,
                             rsearch.fit(X_train, y_train) random_state=5)
print(rsearch.best_score_)
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Scikit – learn the foundation

Scikit – learn is introduced

classification

Return to the

clustering

Dimension reduction

Model selection

pretreatment

Scikit-learn machine learning steps

Import common libraries

Load the data

Divide training set and test set

Data preprocessing

standardized

The normalized

binarization

Coding classification feature

Input missing value

Generating polynomial feature

Create a model estimator

Supervised learning

Unsupervised learning

Fitting the data

Supervised learning

Unsupervised learning

To predict

Supervised learning

Unsupervised learning

Evaluating model performance

Classification indexes

Return to the index

Cluster indicators

Cross validation

Model to adjust

The grid search

Stochastic parameter optimization

Scikit – learn the foundation

Scikit – learn is introduced

classification

Return to the

clustering

Dimension reduction

Model selection

pretreatment

Scikit-learn machine learning steps

Import common libraries

Load the data

Divide training set and test set

Data preprocessing

standardized

The normalized

binarization

Coding classification feature

Input missing value

Generating polynomial feature

Create a model estimator

Supervised learning

Unsupervised learning

Fitting the data

Supervised learning

Unsupervised learning

To predict

Supervised learning

Unsupervised learning

Evaluating model performance

Classification indexes

Return to the index

Cluster indicators

Cross validation

Model to adjust

The grid search

Stochastic parameter optimization

Related Posts

Three chemical explosions in half a month. When will digital production be realized?

Use PetitPotam instead of Printerbug

Configure React Native on a Mac