• By Han Xinzi @Showmeai
  • Tutorial address: www.showmeai.tech/tutorials/4…
  • This paper addresses: www.showmeai.tech/article-det…
  • Statement: All rights reserved, please contact the platform and the author and indicate the source
  • collectionShowMeAICheck out more highlights

The introduction

Same Rossmann this scene, ShowMeAI on a machine learning real | Python machine learning integrated project – electricity sales forecast in to explain the basic exploratory data analysis, data preprocessing and modeling process, this paper we take a look at these processes, some of the details to do some optimization.

1. Project overview

1.1 Background

Founded in 1972, Rossmann is the largest daily chemical supermarket in Germany, with more than 3,000 pharmacies in seven European countries. Stores sometimes hold short promotions and continuous promotions to increase sales. In addition, store sales are affected by many factors, including promotions, competition, school and national holidays, seasonality and periodicity.

Reliable sales forecasting enables store managers to create effective employee schedules, thereby improving productivity and motivation, such as better adjustment of supply chain and rational promotion and competitive strategies, which have important practical and strategic significance. Helping Rossmann create a strong predictive model will help warehouse managers focus on what matters most to them: customers and teams.

The task of this project is to establish a machine learning model and predict the 6-week sales volume of Rossmann’s 1115 stores across Germany through the data provided.

1.2 Data Introduction

Data of 1,115 Rossmann chain stores as the research object, from January 1, 2013 to July 2015, a total of 1,017,209 sales data (27 features) were recorded.

The dataset contains four files:

  • train.csv: Contains historical data of sales volume.
  • test.csv: Does not include historical sales data.
  • sample_submission.csv: Sample file submitted in the correct format.
  • store.csv: Some additional information about each store.

Among them, the data in train.csv contains 9 columns of information:

  • store: indicates the ID number of the corresponding store.
  • DayOfWeek: represents the number of opening days per week.
  • Data: is the date when the corresponding Sales volume is generated.
  • Sales: Is the historical data of sales.
  • Customers: is the number of customers coming into the store.
  • Open: indicates whether the store is open or not.
  • Promo: indicates whether the store has a sale that day.
  • StateHolidaySchoolHoliday SchoolHoliday is a national holiday.

(1) the training set

In the data overview at the bottom of Kaggle’s data page, we can roughly view the distribution of each data (in the case of train.csv) and some data samples, such as the following:

(2) the test set

The data columns in test.csv are almost identical to train.csv, but without the Sales(Sales data) and Customers(traffic) columns. Our ultimate goal is to predict the missing Sales data in test. CSV by using supplementary information in test. CSV and store. CSV.

In the data distribution of test.csv, it can be seen that Sales and Customer data strongly associated with Sales are missing compared with the above data.

Data distribution and some sample data are as follows:

(3) Result file

The result file, sample_submission. CSV, contains only id and Sales columns, which is the standard format template for submitting our predicted answers to Kaggle’s solver.

In Python we just need to open this file and fill in the Sales column with the forecast data in order, Using datafame.to_csv (‘ sample_submit.csv ‘), you can save sample_submit.csv with the predicted data locally and prepare it for subsequent upload.

(4) Store information

As you can see, there are corresponding store ids in train. CSV and test. CSV, and the details of these store ids are corresponding to store. CSV, which records the geographical location information and marketing promotion information of some stores.

Store. CSV data distribution, notice that there are a lot of discrete category tags.

Data distribution and some sample data are as follows:

Among them:

  • Store: Indicates the store number.
  • StoreType: Types of stores: there are four different types of stores: A, B, C and D. You can think of it as a pop-up store, a general store, a flagship store, or a mini store — the type of store we have in our lives.
  • Assortment: Use a, B and C to describe the combination level of products sold in the store. For example, the combination of products in the flagship store and the Mini store is definitely very different.
  • Competition Distance,Competition Open Since Year,Competition Open Since Month: indicates the distance of the nearest competitor’s store, the opening time (calculated in years), and the opening time (calculated in months).
  • Promo2: Describe whether the store has any long-term promotion activities.
  • Promo2 Since YearinPromo2 Since Week: represents the year and calendar week in which the store began to participate in the promotion.
  • Promo IntervalDescription:promo2The sequential interval between starts, named for the month in which the promotion resumes.

1.3 Project Objective

After knowing these data, we need to clarify the purpose of our project. In Rossmanns sales forecast, we need to make use of historical data, that is, data in train.csv, for supervised learning. The trained model uses the data in test.csv to make model inference (prediction), and submits the predicted data to Kaggle in the format of sample_submission. In this process, supplementary information in store.csv can be combined to enhance the ability of our model to obtain data.

1.4 Evaluation Criteria

The evaluation index adopted by the model is Root Mean Square Percentage Error (RMSPE) recommended by Kaggle in the competition.


R M S P E = 1 n i = 1 n ( y i y ^ i y i ) 2 = 1 n i = 1 n ( y ^ i y i 1 ) 2 RMSPE = \sqrt{\frac{1}{n}\sum\limits_{i=1}^n\left(\frac{y_i-\hat{y}_i}{y_i}\right)^2} = \sqrt{\frac{1}{n}\sum\limits_{i=1}^n\left(\frac{\hat{y}_i}{{y}_i}-1\right)^2}

Among them:

  • Yiy_ iyi represents the actual sales of the store that day.
  • Y ^ I \hat{y}_iy^ I represents the corresponding forecast sales.
  • NNN is the number of samples.

If sales are zero on any given day, they will be ignored. The smaller the calculated RMSPE value, the smaller the error, and the higher the score.

1.5 Solution Core

Our solutions are divided into the following sections.

  • Step 1: Load the data
  • Step 2: Exploratory data analysis
  • Step 3: Data preprocessing (missing values)
  • Step 4: Feature engineering
  • Step 5: Baseline model and evaluation
  • Step 6: XGBoost modeling
Load the necessary libraries
import pandas as pd
import numpy as np
import xgboost as xgb

import missingno as msno
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
Copy the code

1.6 Loading Data

Rossmann scenario modeling data contains many information dimensions, such as number of customers, holidays, and so on. According to its task objective, it can be identified as a typical regression modeling problem in supervised learning. We first load data and then do subsequent analysis and mining modeling.

# Load data
train = pd.read_csv('./rossmann-store-sales/train.csv')
test = pd.read_csv('./rossmann-store-sales/test.csv')
store = pd.read_csv('./rossmann-store-sales/store.csv')
Copy the code

The datafame.info () operation displays basic information about DataFrame data, such as value distribution and missing values. Detailed pandas also welcome to view the operation ShowMeAI series of data analysis and data scientific tools quick | pandas use guide.

The operation results in the following figure show that there are missing values in both test. CSV and store. CSV, which can be preprocessed accordingly.

train.info(), test.info(), store.info()
Copy the code
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1017209 entries, 0 to 1017208 Data columns (total 9 columns): Store 1017209 non-null int64 DayOfWeek 1017209 non-null int64 Date 1017209 non-null object Sales 1017209 non-null int64 Customers 1017209 non-null int64 Open 1017209 non-null int64 Promo 1017209 non-null int64 StateHoliday 1017209 non-null object SchoolHoliday 1017209 non-null int64 dtypes: int64(7), object(2) memory usage: 69.8 + MB < class 'pandas. Core. Frame. The DataFrame' > RangeIndex: 41088 entries, 0 to 41087 Data columns (total 8 columns) : Id 41088 non-null int64 Store 41088 non-null int64 DayOfWeek 41088 non-null int64 Date 41088 non-null object Open 41077 non-null float64 Promo 41088 non-null int64 StateHoliday 41088 non-null object SchoolHoliday 41088 non-null int64 Int64 dtypes: float64 (1), (5), object (2) the memory usage: 2.5 + MB < class 'pandas. Core. Frame. The DataFrame' > RangeIndex: 1115 entries, 0 to 1114 Data columns (total 10 columns): Store 1115 non-null int64 StoreType 1115 non-null object Assortment 1115 non-null object CompetitionDistance 1112 non-null float64 CompetitionOpenSinceMonth 761 non-null float64 CompetitionOpenSinceYear 761 non-null float64 Promo2 1115 non-null int64 Promo2SinceWeek 571 non-null float64 Promo2SinceYear 571 non-null float64 PromoInterval 571 non-null Object dTypes: Float64 (5), INT64 (2), Object (3) Memory Usage: 87.2+ KBCopy the code

2. Exploratory data analysis

Let’s do a little analysis of the target result, Sales, and plot its distribution first

train.loc[train.Open==0].Sales.hist(align='left')
Copy the code

Discovery: When the store is closed, the daily sales must be 0.

fig = plt.figure(figsize=(16.6))

ax1 = fig.add_subplot(121)
ax1.set_xlabel('Sales')
ax1.set_ylabel('Count')
ax1.set_title('Sales of Closed Stores')
plt.xlim(-1.1)
train.loc[train.Open==0].Sales.hist(align='left')

ax2 = fig.add_subplot(122)
ax2.set_xlabel('Sales')
ax2.set_ylabel('PDF')
ax2.set_title('Sales of Open Stores')
sns.distplot(train.loc[train.Open!=0].Sales)

print('The skewness of Sales is {}'.format(train.loc[train.Open!=0].Sales.skew()))
Copy the code
The skewness of Sales is 1.5939220392699809
Copy the code

After removing the data when the store was closed, redraw the daily sales distribution map when the store was opened. It can be found that the daily sales show an obvious biased distribution, with a skewness of 1.594, much higher than 0.75. We will consider pre-processing the data distribution.

Below we only use the store business (Open! =0) for training.

train = train.loc[train.Open != 0]
train = train.loc[train.Sales > 0].reset_index(drop=True)
train.shape
Copy the code
(844338, 9)
Copy the code

3. Missing value processing

# Missing information for training set: none missing
train[train.isnull().values==True]
Copy the code
Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday
Missing information for the test set
test[test.isnull().values==True]
Copy the code
Id Store DayOfWeek Date Open Promo StateHoliday SchoolHoliday
479 480 622 4 2015/9/17 NaN 1 0
1335 1336 622 3 2015/9/16 NaN 1 0
2191 2192 622 2 2015/9/15 NaN 1 0
3047 3048 622 1 2015/9/14 NaN 1 0
4759 4760 622 6 2015/9/12 NaN 0 0
5615 5616 622 5 2015/9/11 NaN 0 0
6471 6472 622 4 2015/9/10 NaN 0 0
7327 7328 622 3 2015/9/9 NaN 0 0
8183 8184 622 2 2015/9/8 NaN 0 0
9039 9040 622 1 2015/9/7 NaN 0 0
10751 10752 622 6 2015/9/5 NaN 0 0

Let’s look at the absence of a store

Missing information about # store
msno.matrix(store)
Copy the code

There are missing values in both test. CSV and store. CSV, we will deal with them and merge the features:

All stores in test are open by default
test.fillna(1,inplace=True)

# Missing values in CompetitionDistance are filled with the median
store.CompetitionDistance = store.CompetitionDistance.fillna(store.CompetitionDistance.median())

# add 0 to all other missing values
store.fillna(0,inplace=True)
Copy the code

We know that some of the ways to deal with missing values include:

  • Delete columns (remove columns containing missing values).
  • Fill in the missing values (fill in the mean, median, fit, etc.).
  • Mark missing values as special values (such as -999) or add a new column to indicate if a field is missing.
# Feature combination
train = pd.merge(train, store, on='Store')
test = pd.merge(test, store, on='Store')
Copy the code
train.head(10)
Copy the code
Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday StoreType Assortment CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2 Promo2SinceWeek Promo2SinceYear PromoInterval
0 1 5 2015/7/31 5263 555 1 1 0 1 c a 1270 9 2008 0 0 0 0
1 1 4 2015/7/30 5020 546 1 1 0 1 c a 1270 9 2008 0 0 0 0
2 1 3 2015/7/29 4782 523 1 1 0 1 c a 1270 9 2008 0 0 0 0
3 1 2 2015/7/28 5011 560 1 1 0 1 c a 1270 9 2008 0 0 0 0
4 1 1 2015/7/27 6102 612 1 1 0 1 c a 1270 9 2008 0 0 0 0
5 1 6 2015/7/25 4364 500 1 0 0 0 c a 1270 9 2008 0 0 0 0
6 1 5 2015/7/24 3706 459 1 0 0 0 c a 1270 9 2008 0 0 0 0
7 1 4 2015/7/23 3769 503 1 0 0 0 c a 1270 9 2008 0 0 0 0
8 1 3 2015/7/22 3464 463 1 0 0 0 c a 1270 9 2008 0 0 0 0
9 1 2 2015/7/21 3558 469 1 0 0 0 c a 1270 9 2008 0 0 0 0

4. Feature engineering

4.1 Feature extraction function

def build_features(features, data) :

    # Features of direct use
    features.extend(['Store'.'CompetitionDistance'.'CompetitionOpenSinceMonth'.'StateHoliday'.'StoreType'.'Assortment'.'SchoolHoliday'.'CompetitionOpenSinceYear'.'Promo'.'Promo2'.'Promo2SinceWeek'.'Promo2SinceYear'])

    # the following characteristics are handled reference: https://blog.csdn.net/aicanghai_smile/article/details/80987666

    # Time feature, extract information such as year, month, day, week and so on
    features.extend(['Year'.'Month'.'Day'.'DayOfWeek'.'WeekOfYear'])
    data['Year'] = data.Date.dt.year
    data['Month'] = data.Date.dt.month
    data['Day'] = data.Date.dt.day
    data['DayOfWeek'] = data.Date.dt.dayofweek
    data['WeekOfYear'] = data.Date.dt.weekofyear

    # 'CompetitionOpen' : competitor's business hours
    # 'PromoOpen' : Competitor's promotional time
    # Both features are in months
    features.extend(['CompetitionOpen'.'PromoOpen'])
    data['CompetitionOpen'] = 12*(data.Year-data.CompetitionOpenSinceYear) + (data.Month-data.CompetitionOpenSinceMonth)
    data['PromoOpen'] = 12*(data.Year-data.Promo2SinceYear) + (data.WeekOfYear-data.Promo2SinceWeek)/4.0
    data['CompetitionOpen'] = data.CompetitionOpen.apply(lambda x: x if x > 0 else 0)        
    data['PromoOpen'] = data.PromoOpen.apply(lambda x: x if x > 0 else 0)

    # 'IsPromoMonth' : whether the store is in promotion month, 1 means yes, 0 means no
    features.append('IsPromoMonth')
    month2str = {1:'Jan'.2:'Feb'.3:'Mar'.4:'Apr'.5:'May'.6:'Jun'.7:'Jul'.8:'Aug'.9:'Sept'.10:'Oct'.11:'Nov'.12:'Dec'}
    data['monthStr'] = data.Month.map(month2str)
    data.loc[data.PromoInterval==0.'PromoInterval'] = ' '
    data['IsPromoMonth'] = 0
    for interval in data.PromoInterval.unique():
        ifinterval ! =' ':
            for month in interval.split(', '):
                data.loc[(data.monthStr == month) & (data.PromoInterval == interval), 'IsPromoMonth'] = 1

    # Character features are converted to numbers
    mappings = {'0':0.'a':1.'b':2.'c':3.'d':4}
    data.StoreType.replace(mappings, inplace=True)
    data.Assortment.replace(mappings, inplace=True)
    data.StateHoliday.replace(mappings, inplace=True)
    data['StoreType'] = data['StoreType'].astype(int)
    data['Assortment'] = data['Assortment'].astype(int)
    data['StateHoliday'] = data['StateHoliday'].astype(int)
Copy the code

4.2 Feature Extraction

# Processing Date facilitates feature extraction
train.Date = pd.to_datetime(train.Date, errors='coerce')
test.Date = pd.to_datetime(test.Date, errors='coerce')

Use the features array to store the features used
features = []

# Feature extraction for train and test
build_features(features, train)
build_features([], test)

# Print features used
print(features)
Copy the code
['Store', 'CompetitionDistance', 'CompetitionOpenSinceMonth', 'StateHoliday', 'StoreType', 'Assortment', 'SchoolHoliday', 'CompetitionOpenSinceYear', 'Promo', 'Promo2', 'Promo2SinceWeek', 'Promo2SinceYear', 'Year', 'Month', 'Day', 'DayOfWeek', 'WeekOfYear', 'CompetitionOpen', 'PromoOpen', 'IsPromoMonth']
Copy the code

5. Benchmark model and evaluation

5.1 Define evaluation criteria functions

Since continuous values need to be predicted, a regression model is required. Since this project is a Kaggle challenge, the test set is evaluated using Root Mean Square Percentage Error (RMSPE), so only RMSPE can be used here. RMSPE calculation formula is as follows:


R M S P E = 1 n i = 1 n ( y i y ^ i y i ) 2 {\rm RMSPE} = \frac{1}{n}\sqrt{\sum\limits_{i = 1}^n {{{\left( {\frac{{{y_i} – {{\hat y}_i}}}{{{y_i}}}} \right)}^2}}}

Where yiy_iyi and y^ I {\hat y}_iy^ I are the true and predicted values of the third sample label respectively.

# Evaluation function Rmspe
# reference: https://www.kaggle.com/justdoit/xgboost-in-python-with-rmspe

def ToWeight(y) :
    w = np.zeros(y.shape, dtype=float) ind = y ! =0
    w[ind] = 1./(y[ind]**2)
    return w

def rmspe(yhat, y) :
    w = ToWeight(y)
    rmspe = np.sqrt(np.mean(w * (y-yhat)**2))
    return rmspe

def rmspe_xg(yhat, y) :
    y = y.get_label()
    y = np.expm1(y)
    yhat = np.expm1(yhat)
    w = ToWeight(y)
    rmspe = np.sqrt(np.mean(w * (y-yhat)**2))
    return "rmspe", rmspe

def neg_rmspe(yhat, y) :
    y = np.expm1(y)
    yhat = np.expm1(yhat)
    w = ToWeight(y)
    rmspe = np.sqrt(np.mean(w * (y-yhat)**2))
    return -rmspe
Copy the code

5.2 Benchmark model evaluation

We build a regression tree model as the basic model for modeling and evaluation. For regression tree, we directly use SKLearn’s DecisionTreeRegressor, which is matched with K-fold cross verification and grid search for parameter tuning. The main hyperparameter is max_depth, the maximum depth of the tree.

GridSearchCV defaults to finding the maximum parameter for scoring_fNC, and uses the RMSPE metric directly. The smaller the value, the better the model works. Therefore, neg_RMSPE should be negative, so the larger neg_RMSPE value is. The more accurate the model.

from sklearn.model_selection import GridSearchCV, ShuffleSplit
from sklearn.metrics import make_scorer

from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor(random_state=2)

cv_sets = ShuffleSplit(n_splits=5, test_size=0.2)    
params = {'max_depth':range(10.40.2)}
scoring_fnc = make_scorer(neg_rmspe)

grid = GridSearchCV(regressor,params,scoring_fnc,cv=cv_sets)
grid = grid.fit(train[features], np.log1p(train.Sales))

DTR = grid.best_estimator_
Copy the code
# Display the best hyperparameters
DTR.get_params()
Copy the code
{'criterion': 'mse', 'max_depth': 30, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_impurITY_split ': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'presort': False, 'random_state': 2, 'splitter': 'best'}Copy the code
# Generate upload file
submission = pd.DataFrame({"Id": test["Id"]."Sales": np.expm1(DTR.predict(test[features]))})
submission.to_csv("benchmark.csv", index=False)
Copy the code

The Public Score of the model in the test set is 0.18423, and the Private Score is 0.22081. Let’s use XGBoost to improve the benchmark results.

6. XGBoost modeling and tuning

6.1 Model Parameters

XGBoost is a powerful model with many tunable parameters (see ShowMeAI article XGBoost modeling application for details). We mainly adjust the following hyperparameters:

  • eta: Learning rate.
  • max_depth: Maximum depth of a single regression tree, smaller results in under-fitting, larger results in over-fitting.
  • subsample: Between 0-1, control the proportion of random sampling of each tree, reduce the value of this parameter, and the algorithm will be more conservative to avoid overfitting. However, if this value is set too small, it may result in an underfit.
  • colsample_bytree: 0-1 is used to control the proportion of randomly sampled features of each tree.
  • num_trees: Trees of trees, i.e. the number of iterative steps.
# # The default version parameter
# params = {'objective': 'reg:linear',
Eta '#' : 0.01,
# 'max_depth': 11,
# 'subsample: 0.5,
# 'colsample_bytree: 0.5,
# 'silent': 1,
# 'seed': 1
#}
# num_trees = 10000
Copy the code
# The second adjustment, the learning rate is too large, the effect is reduced
# params = {"objective": "reg:linear",
# "booster" : "gbtree",
# "eta" : 0.3,
# "max_depth": 10,
# "subsample" : 0.9,
# "colsample_bytree" : 0.7,
# "silent": 1,
# "seed": 1301
#}
# num_trees = 10000
Copy the code
The step size is moderate, the convergence speed is fast and the result is excellent
params = {"objective": "reg:linear"."booster" : "gbtree"."eta": 0.1."max_depth": 10."subsample": 0.85."colsample_bytree": 0.4."min_child_weight": 6."silent": 1."thread": 1."seed": 1301
          }
num_trees = 1200
Copy the code

6.2 Model training

import numpy as np  # import numpy package
from sklearn.model_selection import KFold  # Import KFold package from sklearn

Numpy array is recommended for input data. Using a list format will cause errors
def K_Flod_spilt(K,fold,data) :
    Param K: The number of shares to divide the data set. K=10 :param fold: the number of folds to be taken. Flod =5 :param data: the data to be partitioned :param label: the corresponding label to be partitioned :return: the training set, test set and the corresponding label to be partitioned
    split_list = []
    kf = KFold(n_splits=K)
    for train, test in kf.split(data):
        split_list.append(train.tolist())
        split_list.append(test.tolist())
    train,test=split_list[2 * fold],split_list[2 * fold + 1]
    return  data[train], data[test]  # A data set that has been partitioned
Copy the code
# Randomly divide training set and verification set
from sklearn.model_selection import train_test_split

# X_test = train_test_split(test_size=0.2, random_state=2)
X_train, X_test = K_Fold_spilt(10.5,train,label)

dtrain = xgb.DMatrix(X_train[features], np.log1p(X_train.Sales))
dvalid = xgb.DMatrix(X_test[features], np.log1p(X_test.Sales))
dtest = xgb.DMatrix(test[features])

watchlist = [(dtrain, 'train'),(dvalid, 'eval')]
gbm = xgb.train(params, dtrain, num_trees, evals=watchlist, early_stopping_rounds=50, feval=rmspe_xg, verbose_eval=False)
Copy the code

6.3 Submitting the Result File

Generate the commit file
test_probs = gbm.predict(xgb.DMatrix(test[features]), ntree_limit=gbm.best_ntree_limit)
indices = test_probs < 0
test_probs[indices] = 0
submission = pd.DataFrame({"Id": test["Id"]."Sales": np.expm1(test_probs)})
submission.to_csv("xgboost.csv", index=False)
Copy the code

6.4 Feature Optimization

In e-commerce scenarios, historical statistical features are also very important. We can construct statistical features of historical sales data with different time granularity as supplementary information, which is also helpful for modeling effect optimization. Here are some examples:

sales_mean_bystore = X_train.groupby(['Store'[])'Sales'].mean().reset_index(name='MeanLogSalesByStore')
sales_mean_bystore['MeanLogSalesByStore'] = np.log1p(sales_mean_bystore['MeanLogSalesByStore'])

sales_mean_bydow = X_train.groupby(['DayOfWeek'[])'Sales'].mean().reset_index(name='MeanLogSalesByDOW')
sales_mean_bydow['MeanLogSalesByDOW'] = np.log1p(sales_mean_bydow['MeanLogSalesByStore'])

sales_mean_bymonth = X_train.groupby(['Month'[])'Sales'].mean().reset_index(name='MeanLogSalesByMonth')
sales_mean_bymonth['MeanLogSalesByMonth'] = np.log1p(sales_mean_bymonth['MeanLogSalesByMonth'])
Copy the code

The resources

  • Diagram of machine learning algorithm | from entry to master series
  • Data analysis series tutorial
  • Quick data science tools | Pandas use guide

ShowMeAIRecommended series of tutorials

  • Illustrated Python programming: From beginner to Master series of tutorials
  • Illustrated Data Analysis: From beginner to master series of tutorials
  • The mathematical Basics of AI: From beginner to Master series of tutorials
  • Illustrated Big Data Technology: From beginner to master
  • Illustrated Machine learning algorithms: Beginner to Master series of tutorials
  • Machine learning: Teach you how to play machine learning series

Related articles recommended

  • Application practice of Python machine learning algorithm
  • SKLearn introduction and simple application cases
  • SKLearn most complete application guide
  • XGBoost modeling applications in detail
  • LightGBM modeling applications in detail
  • Python Machine Learning Integrated Project – E-commerce sales estimates
  • Python Machine Learning Integrated Project — E-commerce Sales Estimation
  • Machine learning feature engineering most complete interpretation
  • Application of Featuretools
  • AutoML Automatic machine learning modeling