It’s the end of the year, and it’s time to hand in the year-end summary. For coders, of course, it’s about keeping an eye on what open source libraries are coming out this year, and what’s new that’s popular and good to use.

For the last two years, we’ve blogged about some of the most popular works in the Python community. Now, to sum up 2017 in the same way.

Forgive those of you who are not machine learning libraries, but we are a little biased :). In the interest of fairness, please make your voice heard in the comments section for the great software not mentioned in this article.

Cut the crap. Here we go!

1, Pipenv

No. 1 is it. It was only released earlier this year, but it already affects the workflow of every Python developer, and Python.org now officially recommends it as a support library management tool!

Pipenv began as a weekend project by Kenneth Reitz to bring the bright lights of package managers like NPM or YARN to the Python world. Never mind installing Virtualenv, virtualenvwrapper, managing requirements. TXT files, not to mention ensuring reusability based on the version of the supported library (click here to learn more). With Pipenv, there is no need for commands to add, delete, or update support libraries; a single Pipfile takes care of them all. The tool generates a pipfile. lock file to ensure that the compiled libraries are final and to avoid bugs caused by missing any supporting libraries at the edges.

Of course, Pen V has a lot of other nice features, and its documentation is awesome. Check it out and use it to support your Python projects, we already use it in Tryolabs 🙂

2, PyTorch


If there was a library that exploded in the deep learning community this year, it had to be PyTorch, a deep learning framework that Facebook only launched this year.

PyTorch is built and improved upon the popular Torch framework, which is Python based in comparison to Lua. In recent years, as more and more people have started to do data science work in Python, PyTorch makes deep learning more approachable.

It is worth noting that PyTorch, with its new dynamic computational graph paradigm, has become a dedicated framework for many researchers. When writing code using frameworks such as TensorFlow, CNTK, and MXNet, you must first define a computational graph, which specifies all the operations the code will run on, and then compile and optimize it in the framework to enable GPU parallel processing and faster computation. This paradigm is called a static computational graph. The benefit is that you can take advantage of various optimizations, and because the build and execution are separate, the diagram can run on different devices once it’s built. However, for tasks such as natural language processing, the workload is variable. Images can be adjusted to a fixed size before being applied to the algorithm, but sentences of different lengths cannot be similarly processed. The advantage of PyTorch and dynamograms is that you can use standard Python control instructions in your code to define your dynamograms as you execute them, which gives you more freedom, which is crucial for multitasking.

Like other modern deep learning frameworks, PyTorch can do gradient calculations, it’s extremely fast and scalable, so why not give it a try?

3, Caffe2


It may sound crazy, but Facebook did release another blockbuster deep learning framework this year.

The Caffe framework has been widely used for many years and is known for its unmatched performance and battle-tested code base. However, the latest trends in deep learning have stalled the framework in some directions. Caffe2 tries to introduce Caffe to the modern world.

It supports distributed training and deployment, including multiple mobile platforms and the latest CPU-CUDA-compatible hardware. PyTorch is more suitable for research, Caffe2 is suitable for large-scale deployments like Facebook.

Also check out the recent efforts of ONNX, the Open Neural Network Exchange. Isn’t it great to build and train with PyTorch and deploy with Caffe2?

4, Pendulum


Arrow, which made the list last year, is a library designed to make it easier for Pythons to process datetimes. This year, it’s the turn of the Pendulum.

The great advantage of Pendulum is that it is a built-in replacement for Python’s Datetime class, which can be easily integrated into existing code and called only when needed. The authors carefully tune time zones to ensure they are accurate and time-sensitive for each instance, which defaults to UTC. Also, an extended Timedelta is provided to make datetime calculations easier.

Unlike other existing libraries, it provides an API for predictable behavior so you know what to expect. If you are developing complex projects that use DateTimes, this library will make life easier for you! To learn more, consult the documentation.

5, the Dash


Work in the scientific data, without use of Pandas, scikit – learn the Python in the ecological system, and control workflow Jupyter Notebooks, him, love with you and colleagues. But what about sharing your work with people who don’t know how to use these tools? How do you build an interface that everyone can easily use, manipulate data and visualize? In the past, you needed a dedicated javascript-savvy front-end team to build guIs for this purpose, but not anymore.

Dash, released this year, is an open source library for building data visualization Web apps in a pure Python environment. The library is built based on Flask, Plotly. Js and React, and provides an abstraction layer for quick development without learning the above frameworks. It renders and responds in the browser, so it works on mobile.

If you want to know what Dash can do, this sample library is a feast for your eyes.

6, PyFlux

Many Python libraries are useful for data science and machine learning, but when data points are matrices that evolve over time, such as stock prices or measurements taken from devices, it’s a different story.

PyFlux is an open source Python library that deals specifically with time series. Time series is a subfield of statistics and econometrics whose goal is to describe the behavior of time series (mainly with respect to potential components or interest characteristics) and to predict how they will evolve in the future.

PyFlux allows time series modeling in a probabilistic manner and implements several modern time series models like GARCH. It’s a nice, nice thing.

7, the Fire

A Command Line Interface (CLI) is often required when working on projects. In addition to the traditional Argparse, Python has tools like Click or Docopt. Google’s Fire, released this year, takes a different approach to the same problem.

Fire is an open source library that automatically generates a command-line interface for any Python project. Note that the focus is on building the command line interface automatically, with little need for code or documentation. Simply call a Fire method and pass it to anything that needs to generate a command line interface, such as functions, objects, classes, dictionaries, or even no arguments, and Fire will generate a command line interface (CLI) for all code.

Please read this guide carefully and use examples to understand how it works. Keep an eye on this library, it will definitely save you a lot of time.

8, the imbalance – learn

Ideally, we have a balanced data set that we can use to train the model, and everything works out fine. Unfortunately, this is not the case in the real world, where tasks are more unbalanced data. For example, when predicting counterfeit credit card transactions, expect the vast majority of more than 99.9 percent to be legitimate. Just training naive machine learning algorithms is going to suck you out, so you need to pay special attention to these kinds of data sets.

Fortunately, this problem has been solved, and there are various technical solutions. Imbalanced-learn This Python package provides a variety of technical solutions to make development easier. It is compatible with scikit-learn and is part of the Scikit-learn-contrib project. Useful!

9 FlashText.

In general, regular expressions can be used to find or replace text. Regular expressions do the job just fine. However, when you need to look up thousands of terms, the regular expression is dead slow.

FlashText solves this problem better. In the authors’ original benchmark, it significantly improved the overall operation run time from 5 days to 5 minutes.

The beauty of FlashText is that the running time is the same no matter how many terms are looked up, whereas the running time of regular expressions increases linearly with the number of terms.

FlashText demonstrates the importance of algorithms and data structure design, showing that even simple problems can be more efficient with a better algorithm than a more powerful CPU.

10, Luminoth

Disclaimer: This library was developed by Tryolab R&d center.

In a world full of images, some apps need to make sense of them. Image processing has also come a long way, thanks to the development of deep learning.

Luminoth is a computer vision processing toolbox based on TensorFlow and Sonnet. Currently, it supports object detection through a fast R-CNN model.

Luminoth not only implements this particular model, it is also built on a modular basis and can be extended. Therefore, it is possible to customize existing components or combine new models to solve complex problems, so reuse this code. When building a deep learning model, it provides the necessary development tools: Convert image data to an applicable format for processing by a data pipeline (TensorFlow’s TFRecords), perform data enhancement, train on one or more Gpus (distributed training must be used when working with large data sets), run evaluation metrics, easily visualize in TensorFlow, Deploy the training model through a simple API or browser interface for human operation.

Luminoth is also easy to integrate with Google’s Cloud machine learning engine, so even if you don’t have a powerful GPU, you can train models in the cloud just as easily with a single command.

If you’re interested, you can read the post or watch our interview video on ODSC to learn more about the letter.

More great libraries: These are good too

PyVips

If you’ve never heard of libvips, let’s just say it’s an image processing library like Pillow or ImageMagick that supports many image formats. However, libvips is faster and takes up less memory than other libraries. For example, benchmarks show that it is three times faster than ImageMagick and uses only 1/15 of the memory. Check out the advantages of libvips here.

PyVips is the recently released Python version of libvips, compatible with Python2.7, 3.6, and PyPy. With PIP installations, older versions of bindings are inline compatible with no code changes.

If you’re doing image processing in your APP, be sure to pay attention to it.

Requestium

Disclaimer: This library is published by Tryolabs.

Sometimes, you might automate things online. For example, crawling a website, testing an application, or filling out a form on the web, but you don’t want to use the API, automation becomes necessary. Python provides an excellent Requests library to assist with this. Unfortunately, many sites use heavy javascript-based clients, which means that the HTML for Requests doesn’t even have a form in it to automate, let alone auto-fill! It basically gets back code like empty divs that modern front-end libraries like React or Vue generate in the browser.

While you can reverse engineer javascript-generated code, it can take hours to compile. Dealing with ugly JS code, thank you, but forget it. Another approach is to use the Selenium library, which allows you to programmatically interact with the browser and run JavaScript code. Using this library is fine, but it’s much slower than Requests that consume very few resources.

Wouldn’t it be nice if you could focus on Requests and call Selenium seamlessly only when you need it? Take a look at Requestium, which replaces Requests inline and does a great job. It integrates Parsel, makes the page query element selector code particularly clean, and helps with common operations such as clicking on elements and rendering content in the DOM. Another web page automation time saver!

skorch

Like the SciKit-Learn API, but have to work with PyTorch? Don’t worry, the Skorch packer provides a SkLear-like interface to PyTorch. If you are familiar with these libraries, you will find the syntax straightforward. With Skorch, you can abstract some of the code so you can focus on what’s really important, like doing some data science.

conclusion

This year! If you think there are other libraries that could make it on this list, let us know in the comments section, the world is changing so fast that we can’t keep up. Finally, thanks again to everyone in the community for their contributions!

Finally, don’t forget to subscribe to our newsletter and don’t miss our machine learning content.

Appreciate the author

In this paper, the author

Stay birds

Columnist for the Python Chinese Community. I want to learn Python data analysis, natural language processing, and learn a little bit about it. I just want to translate some Python articles and share them with my Python friends.

Click here to apply to be a Python Chinese Community columnist