Theory of wisdom

Compile | Bot

Source | Data Science Central

Editor’s note: Michael Li, CEO of The Data Incubator, a US company, has published a ranking of deep learning repositories that reflects some of The work and learning preferences of current Data scientists. According to Nonzhi, The Data Incubator is The largest Data science training company in The world, with an admission rate of less than 3% and “more difficult than Harvard”. In this ranking, The Data Incubator scores deep learning libraries based on their performance on Github, Stack Overflow, and Google search, and evaluates The market performance of mainstream deep learning libraries.

ranking

Here is a list of 23 open source deep learning libraries commonly used by data scientists, ranked based on their performance on Github, Stack Overflow, and Google searches. The table shows a standardized score with an average of 0, and a score of 1 means it is one standard deviation above the average. In the case of Caffe and DeepMind4j, Caffe has a score of 1 on Github, indicating it is performing above average on the platform, while DeepMind4j has a score of -0.06, indicating it is close to average.

Conclusion 1: TensorFlow relies on a large active community to take the top spot

As shown in the figure above, TensorFlow’s performance on all platforms is at least 2 standard deviations higher than the average. Compared to # 3 Caffe, it outperformed three times in Github Forks and even broke through 14 times in Stack Overflow Questions. Behind TensorFlow’s success as a two-year-old library is Google’s machine-learning triumph.

In November 2015, Google brain released TensorFlow as open source software. Although it supports running Python apis on top of a C++ engine, it’s too cumbersome to write code directly, so some people used TensorFlow as a background to write some deep learning frameworks. Such as Keras and Sonnet in this list. In any case, TensorFlow’s popularity is well founded, and its combination of a generic deep learning framework, flexible interfaces, a beautiful and clean graphical interface, and the extensive resources contributed by Google’s employee and developer communities should give it the edge.

Conclusion 2: It is too early to replace Caffe with 2

Caffe is number three on the list, and aside from TensorFlow, its strong position on Github is unmatched. Caffe has been considered a more professional deep learning library than TensorFlow in the industry, with a strong track record in image processing, object recognition and pre-trained convolutional neural networks. In April 2017, Facebook officially launched Caffe. We can see from the list that Caffe has grown rapidly in the past 6 months and has surpassed half of the popular deep learning libraries, but there is still a big gap between Caffe and Facebook. Caffe2 is lightweight, modular and expandable, and contains a cyclic neural network. The code bases are independent of each other, so data scientists can continue to use the original version of Caffe. But to attract more loyal customers, Caffe2 also has migration tools like Caffe Translator that help developers drive Caffe models on Caffe2.

Conclusion 3: Keras is the most popular front end for deep learning

Keras is the highest ranked non-framework library and can serve as a front end to TensorFlow, Theano, MXNet, CNTK, and Deeplearning4J. Its popularity stems from its ease of use, which has helped it perform above average on all three platforms and jump to number two on the list. Keras allows users to model quickly at the cost of reducing flexibility and control when working directly with the framework. When it comes to using deep learning in data sets, Keras is at the forefront of many data scientists’ minds. Recently, R Studio has introduced an INTERFACE to R for Keras, a change that will allow Keras to maintain its current momentum of development and adoption and continue on the path of deep learning.

Claim 4: Even without the tech giants’ backing, Theano still has a place

As more and more deep learning a new framework, the emergence of Theano this the oldest of the “old” not only has not disappeared, instead more strength to occupy a place in the list. Theano pioneered the use of computational graphs, which are popular throughout deep and machine learning circles. It is actually a python-oriented numerical computation library, but can be used with advanced deep learning packages like Lasagne. At present, TensorFlow and Keras are supported by Google, PyTorch and Caffe2 are supported by Facebook, Microsoft design and maintain CNTK, and Amazon has launched the official deep learning framework MXNet. Theano, who is from “grassroots”, has no “tree” to rely on. But it remains popular on the merits of flexibility.

Conclusion 5: Sonnet is the fastest growing library

Earlier this year, the DeepMind team announced Sonnet, an advanced framework for rapidly creating neural network modules on TensorFlow. Sonnet’s search results increased 272 percent this quarter, the largest increase of any library, according to Google search data. Although DeepMind was acquired by Google in 2014, the team and Google Brain are separate. DeepMind focuses on strong artificial intelligence, while Google Brain studies how machines learn. Sonnet’s role is to make TensorFlow more like A Torch. Sonnet will contribute to AI thinking in the future, based on DeepMind’s research areas.

Conclusion 6: Python is the preferred language for deep learning interfaces

PyTorch, which supports only Python interfaces, is the second fastest growing deep learning library on the list. PyTorch Google search returned pages increased 236% compared to last quarter. Of the 23 libraries, only three do not have Python interfaces: DLIB, MatConvNet, and OpenNN, while only seven support C++ and only six provide an R interface. While data scientists largely agree on which language to choose, practitioners in deep learning have some options.

The research methods

We collected the 23 most popular deep learning libraries across five channels, and then calculated the rankings based on the metrics we designed. The Github score is based on stars and forks, Stack Overflow is based on tags and questions involved, and Google search is based on the total number of search results over the last five years, with quarterly growth rates.

Matters needing attention:

  • Some library names are polysemous, such as caffe, chainer, lasagne, so these words are combined with “deep learning” searches in Google searches;

  • No Stack Overflow data is 0;

  • With a mean of 0 and a standard deviation of 1, the scores are averaged on Github and Stack Overflow, combined with Google search results, for an overall score.

  • To confirm the location of the Github code base, some manual screening is required.

Raw data: github.com/thedataincubator/data-science-blogs/

Click “Read the original” below to view the original website.

This article is compiled on wisdom, reprint please contact this public number for authorization.