He said, “If you want to do a good job, you have to do a good job. If you want to play Python data science, you have to know what Python data science libraries are.”

Before starting this article, I would like to recommend ModelZoo, a deep learning framework just released by Cui Da (Cui Qingcai) on October 1.

ModelZoo Contributor: 7; Optimization times: 22; Star: 38

ModelZoo is a Python deep learning framework built based on TensorFlow Eager mode and Keras, which can facilitate the construction of deep learning model and improve the efficiency of model development. The framework has already built the basic framework of training and prediction models, with default loss functions and optimizers, model preservation, Tensor Summary recording, model Early Stop, etc. It is also very easy to extend and maintain. In addition, ModelZoo will further integrate popular models in NLP, CV and other fields to provide convenience for the vast number of deep learning developers.

Hope we can support more domestic, originally this article is about machine learning, but stupid bird here to do an advertisement for Cui Da, also please go to GitHub to add a few more small stars for ModelZoo.

So let’s start the body


This article is the first in a series introducing top-level Python machine learning, artificial intelligence, deep learning, and data science libraries.

Python is gaining popularity in machine learning, artificial intelligence, deep learning, and data science. According to Builtwith.com, 45% of technology companies choose Python for AI and machine learning tasks.

For this reason, we have decided to publish a series of articles that introduce top-level Python libraries in various fields:

  • 8 Python machine learning libraries
  • X Large Python AI library – coming soon, stay tuned
  • X Large Python deep learning library – coming soon, stay tuned
  • X Big Python Data Science library – coming soon, stay tuned

Our list is very subjective, and some libraries can cover more than one area, such as Keras in this article, but TensorFlow in deep learning. This is because Keras, like SKlearn, is a popular library for end users, while TensorFlow is more suitable for developers and machine learning developers.

GitHub data used for this article is as of October 3, 2018.

1. Scikit-learn contributor: 1175; Optimization times: 23301; Star: 30867

Scikit-learn is a Python machine learning library built on top of Numpy, SciPy and Matplotlib. Scikit-learn provides easy-to-use, high-powered data mining and analysis tools. SKLearn supports a variety of data scenarios, as long as easy to learn, also supports code reuse.

2. Keras Contributor: 726; Optimization times: 4818; Star: 34066

Keras is a high-level API for neural networks written in Python that supports TensorFlow, CNTK, and Theano. Keras’s philosophy is to achieve rapid development, the fastest development speed of ideas into practical results, is the key to good research.

3. XGBoost Contributors: 319, optimizations: 3454; Star: 13630

XGBoost is a distributed gradient enhancement library optimized for greater efficiency, flexibility, and portability. Using the gradient lifting framework in machine learning algorithms. XGBoost provides a parallel tree lifting method (also known as GBDT-gradient lifting decision tree, GBM-gradient lifting machine) that can solve many data science problems quickly and accurately. XGBoost’s code can also run in mainstream distributed environments (Hadoop, SGE, MPI), and with distributed technology XGBoost can handle billions of bytes of data.

4. StatsModels Contributor: 162; Optimization times: 10837; Star: 3275

StatsModels fills the gap in SciPy’s statistical calculations by providing statistical models that include descriptive statistics, evaluation and inference operations.

5. LightGBM Contributor: 91; Modification times: 1272; Star: 6736

LightGBM is a high performance, fast and distributed gradient lifting framework based on decision tree algorithm. Used for machine learning tasks such as rank and classification. LightGBM is part of Microsoft’s DMTK project.

6. CatBoost Contributor: 77; Modification times: 3304; Star: 3241

CatBoost is a machine learning library based on gradient lifting method of decision tree. CatBoost has the advantage of being super high quality compared to other GBDT libraries; Category inference is extremely fast; Support numeric and category features; Integrated data visualization tools.

7. PyBrain Contributors: 32; Optimization times: 992; Star: 2598

PyBrain is a modular Python machine learning library. Its goal is to provide flexible, easy-to-use, and powerful machine learning algorithms, and to test and compare user-generated algorithms using a variety of predefined environments.

8. Eli5 Contributors: 6; Optimization times: 929; Star: 932

Eli5 is a Python library that helps developers debug machine learning classifiers and interpret predictions. Eli5 supports SciKit-learn, XGBoost, LightGBM, Lightning, and Sklear-CrfSuite.

Thanks to the Python fan community public account of Tianshan Smart Energy for supporting me all the time, please stay tuned!

For those who are interested in my article, please also follow my wechat public account “Daonao Python data analysis”.

The Top 8 Python Machine Learning Libraries are provided by Dan Clark