Introduction:

LightGBM, as Microsoft’s open source model in recent two years, has the following advantages compared with XGBoost:

  • Faster training speed and greater efficiency: LightGBM uses histogram based algorithms. For example, it packs continuous buckets into discrete bins, which makes training faster. Another point is that LightGBM splits nodes differently than XGBoost does. LGB avoids splitting the whole layer of nodes, and uses the method of deep decomposition of the nodes with the largest gain. This saves a lot of resources for splitting nodes. Figure 1 shows how XGBoost splits and Figure 2 shows how LightGBM splits.

  • Lower memory usage: Using discrete bins to store and replace continuous values results in less memory usage.

  • Higher accuracy (compared to any other lifting algorithm) : It produces a more complex tree through the leaf-wise splitting method than the Level-wise splitting method, which is the main factor in achieving higher accuracy. However, it can sometimes lead to overfitting, which can be prevented by setting max-depth.

  • Big data processing capability: Compared to XGBoost, it can also handle big data due to the reduced training time.

  • Support for parallel learning

Introduction to LightGBM core parameters

As we all know, XGBoost has three types of parameters: general parameters, learning target parameters and Booster parameters. For LightGBM, we have core parameters, learning control parameters, IO parameters, target parameters, measurement parameters, network parameters, GPU parameters, and model parameters. The core parameters I always modify are the core parameters. Learn to control parameters, measure parameters, etc. Please see LightGBM Chinese documentation for more details

Core parameter

  1. Boosting: also known as boost, boosting_type. The default is GBDT.

    Boosting parameter in LGB is much more than that in XGB. We have traditional GBDT, RF, DART, and DOSS, and the last two are not well understood, but we have tried, and the effect of GBDT is more classic and stable

  2. Num_thread: also called num_thread,nthread. Specifies the number of threads.

    The official documentation here states that setting the number to CPU cores is much faster than the number of threads (given that most cpus are hyper-threaded these days). Parallel learning should not be set to all threads, which makes training slow.

  3. Application: Regression is the default value. It’s also called objective app and that’s the goal of the task

    • regression
      • regression_l2, L2 loss, alias=regression, mean_squared_error, mse
      • regression_l1, L1 loss, alias=mean_absolute_error, mae
      • huber, Huber loss
      • fair, Fair loss
      • poisson, Poisson regression
      • quantile, Quantile regression
      • quantile_l2, similar to Quantile, but using L2 loss
    • binary, binary log loss classification application
    • multi-class classification
      • multiclass, the Softmax target function, should be setnum_class
      • multiclassovaThe one-vs-all dichotomy objective function should be setnum_class
    • cross-entropy application
      • xentropy, the objective function is cross-entropy (with optional linear weights), alias=cross_entropy
      • xentlambda, instead of parameterized cross-entropy, alias=cross_entropy_lambda
      • The label is any value within the [0, 1] interval
    • lambdarank, lambdarank application
      • In the lambdarank task, the label should be int type, the larger the value is, the higher the correlation is (e.g. 0:bad, 1:fair, 2:good, 3:perfect).
      • label_gainCan be used to set the gain (weight) of the int label
  4. Valid: Test, valid_data, test_data. Support for multiple validation sets to, split

  5. Learning_rate: Also called Shrinkage_rate, the step size of gradient descent. The default setting is 0.1, we usually set it between 0.05 and 0.2

  6. Num_leaves: also known as num_leaf, the new LGB changes this to 31, which represents the number of leaves on a tree

  7. Device: default= CPU, options= CPU, GPU

    • Select devices for tree learning, and you can use gpus for faster learning
    • Note: a smaller max_bin (e.g. 63) is recommended for faster speed
    • Note: To speed up learning, gpus use 32-bit floating point sums by default. You can set gpu_use_dp=true to enable 64-bit floating point, but it slows down training
    • Note: Please refer to the installation guide to build the GPU version

Learning control parameter

  1. feature_fraction: default=1.0, type=double, 0.0 < feature_fraction < 1.0sub_feature.colsample_bytree
    • If feature_fraction is less than 1.0, LightGBM will randomly select some features in each iteration. For example, if set to 0.8, 80% of the features will be selected before each tree is trained
    • It can be used to speed up training
    • Can be used to deal with overfitting
  2. bagging_fraction: default=1.0, type=double, 0.0 < bagging_fraction < 1.0, also calledsub_row.subsample
    • Similar to Feature_Fraction, but it will randomly select some data without resampling
    • It can be used to speed up training
    • Can be used to deal with overfitting
    • Note: To enable bagging, bagging_freq should be set to a non-zero value
  3. bagging_freqDefault =0, type=int, also calledsubsample_freq
    • Frequency of bagging, where 0 means bagging is disabled. K means bagging is performed every K iterations
    • Note: To enable bagging, bagging_fraction is set appropriately
  4. Lambda_l1:0 by default, also known as reg_alpha, represents L1 regularization, type double
  5. Lambda_l2:0 by default, also known as reg_lambda, represents L2 regularization, type double
  6. cat_smooth: the default = 10, type = double
    • Used to classify features
    • This can reduce the influence of noise in classification features, especially for categories with little data

Measurement function

  1. metric: default={l2 for regression}, {binary_logloss for binary classification}, {ndcg for lambdarank}, type=multi-enum, Options = L1, L2, NDCG, AUC, binary_logloss, binary_error…
    • l1, absolute loss, alias=mean_absolute_error, mae
    • l2, square loss, alias=mean_squared_error, mse
    • l2_root, root square loss, alias=root_mean_squared_error, rmse
    • quantile, Quantile regression
    • huber, Huber loss
    • fair, Fair loss
    • poisson, Poisson regression
    • ndcg, NDCG
    • map, MAP
    • auc, AUC
    • binary_logloss, log loss
    • binary_error, sample: 0 correct classification, 1 wrong classification
    • multi_logloss, mulit-class Indicates the type of loss logs
    • multi_error, error rate for mulit-class Error rate classification
    • xentropy, cross-entropy (with optional linear weights), alias=cross_entropy
    • xentlambda, “intensity-weighted” cross entropy, alias=cross_entropy_lambda
    • kldiv, Kullback-Leibler divergence, alias=kullback_leibler
    • Support multiple indicators, use, separation

Overall, I still don’t see much difference in LightGBM usage compared to XGBoost. The parameters also have a lot of overlap. Many of the core principles of XGBoost apply to LightGBM as well. Similarly, Lgb has train() and LGBClassifier() and LGBRegressor() functions. The latter two are primarily designed to fit skLearn better, as is the case with XGBoost.

GridSearch tuning parameter

I’ve got GridSearch here so you can poke it in. I’m going to talk about LGBClassifier tuning.

Data I upload here: directly on the code!

import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV  # Perforing grid search
from sklearn.model_selection import train_test_split

train_data = pd.read_csv('train.csv')   # Read data
y = train_data.pop('30').values   # Pop the tag value y from the training data as the training target, where '30' is the column name of the tag
col = train_data.columns   
x = train_data[col].values  The remaining columns are used as training dataTrain_x, valid_x, train_y, valid_y = train_test_split(x, y, test_size=0.333, random_state=0)# Separate training set and validation set
train = lgb.Dataset(train_x, train_y)
valid = lgb.Dataset(valid_x, valid_y, reference=train)


parameters = {
              'max_depth': [15, 20, 25, 30, 35],
              'learning_rate': [0.01, 0.02, 0.05, 0.1, 0.15],
              'feature_fraction': [0.6, 0.7, 0.8, 0.9, 0.95],
              'bagging_fraction': [0.6, 0.7, 0.8, 0.9, 0.95],
              'bagging_freq'[2, 4, 5, 6, 8],'lambda_l1': [0, 0.1, 0.4, 0.5, 0.6],
              'lambda_l2': [0, 10, 15, 35, 40],
              'cat_smooth': [1, 10, 15, 20, 35]
}
gbm = lgb.LGBMClassifier(boosting_type='gbdt',
                         objective = 'binary',
                         metric = 'auc',
                         verbose = 0,
                         learning_rate = 0.01,
                         num_leaves = 35,
                         feature_fraction=0.8,
                         bagging_fraction= 0.9,
                         bagging_freq= 8,
                         lambda_l1= 0.6,
                         lambda_l2= 0)
# With gridSearch we don't need fit functions
gsearch = GridSearchCV(gbm, param_grid=parameters, scoring='accuracy', cv=3)
gsearch.fit(train_x, train_y)

print("Best score: % 0.3 f" % gsearch.best_score_)
print("Best parameters set:")
best_parameters = gsearch.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
    print("\t%s: %r" % (param_name, best_parameters[param_name]))
Copy the code