Want to use lightning-fast promotion methods on your GPU? It’s good to know the library. It is faster than LightGBM and XGBoost for many tasks.

Despite the revival and popularity of neural networks in recent years, the lifting algorithm still has its indispensable advantages in scenarios such as limited training sample size, short training time and lack of parameter tuning knowledge. Currently, the representative promotion methods include CatBoost, Light GBM and XGBoost, etc. This paper introduces a new open source work, which constructs another gPU-based extreme gradient promotion decision tree and random forest algorithm.

Project address: github.com/Xtra-Comput…

So why do we accelerate GBDT and random forest? In 2017, a Kaggle survey showed that 50 percent of data mining practitioners, 46 percent, and 24 percent of machine learning practitioners use decision trees, random forests, and GBM, respectively. GBDT and random forest are often used to create current best data science solutions, which requires efficient training on large data sets using gpus.

An integrated approach to two decision trees, from the XGBoost documentation.

Although libraries such as XGBoost already support gpus, they were not originally designed for gpus, so there are some flaws in optimization and acceleration. ThunderGBM is designed to help users easily and efficiently use GBDT and random forest to solve problems. It can use the GPU for efficient training.

ThunderGBM is faster than the other libraries on many tasks for gpus.

The main features of ThunderGBM are as follows:

  • Usually 10 times as much as other libraries.

  • Supports the Python (Scikit-learn) interface.

  • Supports the Linux operating system.

  • Supports classification, regression, and sorting.

ThunderGBM’s overall flow of prediction and training.

The main authors of ThunderGBM include Zeyi Wen, Qinbin Li, National University of Singapore, Jiashuai Shi, South China University of Technology, and Bingsheng He, NUS.

Getting started guide

ThunderGBM requires a development environment that meets Cmake 2.8 or higher; For Linux, use GCC 4.8 or higher; The boost c + +; CUDA 8 or later.

Download:

git clone https://github.com/zeyiwen/thundergbm.gitcd thundergbm# under the directory of thundergbmgit submodule init cub && git submodule updateCopy the code

Build on Linux:

#under the directory of thundergbmmkdir build && cd build && cmake .. && make -jCopy the code

Quick test:

./bin/thundergbm-train .. /dataset/machine.conf./bin/thundergbm-predict .. /dataset/machine.confCopy the code

After a successful run, you should see RMSE = 0.489562.

Related research

If readers are interested in the technical and model details of the implementation, they can refer to the original paper:

Address: www.comp.nus.edu.sg/~wenzy/pape…

Other relevant literature:

  • Thesis: Efficient Gradient Boosted Decision Tree Training on GPUs

  • Authors: Zeyi Wen, Bingsheng He, Kotagiri Ramamohanarao, Shengliang Lu, and Jiashuai Shi

  • Address: https://www.comp.nus.edu.sg/~hebs/pub/IPDPS18-GPUGBDT.pdf