From the Analytics Vidhya

By Sunil Ray

Heart of the machine compiles

In this article, the author lists the most popular knowledge bases on GitHub in 2017, including various projects in data science, machine learning and deep learning, hoping to help you learn and use them. The Github project of The Heart of the Machine has been added to the list for star and pull requests.

GitHub is one of the most active communities in computer science, where people from diverse backgrounds share a growing number of software tools and resources. Not only can you get the tools you need, but you can also watch how the code is written and implemented.

As a machine learning enthusiast, the author listed the most popular knowledge base on GitHub in 2017, which contains learning materials and tools. I hope it will be helpful to your study and research.


directory


1. Learning resources

1. Awesome Data Science

2. Machine Learning / Deep Learning Cheat Sheet

3. Oxford Deep Natural Language Processing Course Lectures

4. PyTorch – Tutorial

5. Resources of NIPS 2017


Open source tools

1. TensorFlow

2. TuriCreate — A Simplified Machine Learning Library

3. OpenPose

4. DeepSpeech

5. Mobile Deep Learning

6. Visdom

7. Deep Photo Style Transfer

8. CycleGAN

9. Seq2seq

10. Pix2code


3. The Heart of the Machine Project

1. AI00- 100 Companies influencing the future of AI

2. Artificial-Intelligence-Terminology

3. ML-Tutorial-Experiment


1. Learning resources


1.1 Awesome Data Science

Project address: github.com/bulutyazili…

The REPO is a basic resource for data science. Countless contributions over the years have built the resources inside the Repo, from how-to guides to infographics to social network accounts you need to follow. Whether you’re a beginner or an industry veteran, there are plenty of resources to learn.

You can see the depth in the repo’s directory.


1.2 Machine Learning/Deep Learning Cheat Sheet

Project address: github.com/kailashahir…

The cheatsheet covers the tools and techniques commonly used in machine learning/deep learning, from simple tools like Pandas to deep learning techniques. After bookmarking or forking the item, you don’t have to bother searching for common tips and DOS and don ‘ts.

To recap, cheatsheets types include pandas, Numpy, SciKit Learn, Matplotlib, GGplot, dplyr, Tidyr, pySpark, and neural networks.


1.3 Oxford Deep Natural Language Processing Course Lectures

Project address: github.com/oxford-cs-d…

Stanford’s NLP course has always been the gold medal course in natural language processing. However, with the recent development of deep learning, WITH the help of deep learning architectures such as RNN and LSTM, A lot of progress has been made in NLP.

The REPO is based on Oxford University’s NLP course and covers advanced techniques and terminology such as language modeling using RNN, speech recognition, text-to-speech (TTS), and more. The REPO contains everything from course materials to practical connections for the course.


1.4 PyTorch – Tutorial

Project address: github.com/yunjey/pyto…

As of today, PyTorch is the only competitor to TensorFlow, and its features and reputation make it a competitive deep learning framework. Pytorch has gained a lot of attention from the deep learning community for its Pythonic style programming, dynamic computation diagrams, and faster prototyping.

The repository contains a large amount of code for deep learning tasks on PyTorch, including RNN, GAN, and neural style migration. Most of these models require only about 30 lines of code to implement. This speaks volumes about PyTorch’s abstraction ability, which allows researchers to focus on finding the right model without getting bogged down in details like programming language and tool choice.


1.5 Resources of NIPS 2017

Project address: github.com/hindupuravi…

The REPO contains NIPS 2017 resources and slides from all invited talks, tutorials and workshops. NIPS is an annual conference on machine learning and computational neuroscience.

Over the past few years, most of the groundbreaking research in data science has been presented at NIPS. If you want to be on the cutting edge of your field, this is a great resource!


Open source software library


2.1 TensorFlow

Project address: github.com/tensorflow/…


TensorFlow is an open source software library that uses data Flow graphs for numerical calculations. Tensor means you’re passing tensors, Flow means you’re using graphs. The data flow graph uses a directed graph composed of “nodes” and “edges” to describe mathematical operations. “Node” is generally used to refer to the applied mathematical operation, but can also refer to the starting point of data input and the end point of output, or the end point of reading/writing persistent variables. Edges represent input/output relationships between nodes. These edges can carry dynamically adjusted multidimensional arrays, called tensors.

TensorFlow has maintained its position as the top library for deep learning/machine learning since its release. The Google Brain team and the machine learning community have also been actively contributing and keeping up to date, especially in the area of deep learning.

TensorFlow started as an open source software library for numerical calculations using data flow diagrams, but it has become a complete framework for building deep learning models. It currently mainly supports TensorFlow, but also supports languages such as C, C++, and Java. In addition, in November, Google finally released a developer preview of its new tool, TensorFlow, a lightweight solution for mobile and embedded devices.


2.2 TuriCreate: A simplified machine learning library

Project address: github.com/apple/turic…

TuriCreate is an open source project recently contributed by Apple that provides an easy-to-use way to create and deploy machine learning models for complex tasks such as target detection, human posture recognition, and recommendation systems.

As machine learning enthusiasts, we may be familiar with GraphLab Create, a very simple and efficient machine learning library that was acquired by Apple when TuriCreate, the company that created it, made a big splash.

TuriCreate was developed for Python, and its strongest feature is the deployment of a machine learning model into Core ML for developing applications such as iOS, macOS, watchOS, and tvOS.


2.3 OpenPose

Project address: github.com/CMU-Percept…

OpenPose is a multi-person keypoint detection library that helps us detect the location of a person in an image or video in real time. The OpenPose software library, developed and maintained by CMU’s Perceptual Computing Lab, is a good example of how open source research can be quickly deployed to the industry.

One use case of OpenPose is to help solve the problem of activity detection, where actions or activities performed by actors can be captured in real time. These key points and their actions can then be used to make animations. Not only does OpenPose have a C++ API that developers can access quickly, it also has a simple command line interface for processing images or videos.


2.4 DeepSpeech

Project address: github.com/mozilla/Dee…

DeepSpeech is an open source implementation library developed by Baidu that provides state-of-the-art speech-to-text synthesis technology. It is based on TensorFlow and Python, but can also be bound to NodeJS or run from the command line.

Mozilla has been a major research force in building DeepSpeech and the open source software library. Sean White, Vice president of Technology strategy at Mozilla, wrote in a blog post: “There are only a few commercially available voice recognition engines that are open source, and most of them are dominated by large companies, reducing the number of startups, researchers, and traditional businesses that need to customize specific products and services for their users. But we have worked with many developers and researchers in the machine learning community to improve the open source library, so now DeepSpeech uses sophisticated and cutting-edge machine learning techniques to create a speech-to-text engine.”


2.5 Mobile Deep Learning

Project address: github.com/baidu/mobil…

The REPO ports the best of current technology in data science to mobile platforms. The REPO, developed by Baidu Research Institute, aims to deploy the deep learning model to mobile devices, such as Android and IOS, with low complexity and high speed.

The REPO explains a simple use case, target detection. It can identify the exact location of a target (such as a phone in an image). Isn’t that great?


2.6 Visdom

Project address: github.com/facebookres…

Visdom allows diagrams, images, and text to be propagated between collaborators. You can organize your visualizations programmatically, or create dashboards for live data through the UI, check your experiment results, or debug your experiment code.

The input in the plotting function changes, although most of the input is the tensor X of the data (rather than the data itself) and the (optional) tensor Y (containing optional data variables such as labels or timestamps). It supports all basic chart types to create the visualizations Plotly supports.

Visdom supports using PyTorch and Numpy.


2.7 Deep Photo Style Transfer

Project address: github.com/luanfujun/d…

The REPO is based on a recent paper called Deep Photo Style Transfer, which describes a Deep learning method for photographic Style Transfer that can process large amounts of image content while efficiently migrating reference styles. This method successfully overcomes the distortion and meets the photographic style transfer requirements in a large number of scenes, including time, weather, season, art editing and other scenes.


2.8 CycleGAN

Project address: github.com/junyanz/Cyc…

CycleGAN is an interesting and powerful library that demonstrates the potential of this state-of-the-art technology. For example, the following figure shows roughly what the library can do: adjust the depth of field of an image. What’s interesting about this is that you don’t tell the algorithm what part of the image to pay attention to. The algorithm did it all by itself!

Currently the library is written in Lua, but it can also be used from the command line.


2.9 Seq2seq

Project address: github.com/google/seq2…

Seq2seq was originally built for machine translation, but has been developed for a variety of other tasks, including summary generation, dialogue modeling, and image capture. The Seq2seq framework can be used as long as the structure of a problem is to encode input data in one format and decode it into another. It is programmed using all the popular Python-based TensorFlow libraries.



2.10 Pix2code

Project address: github.com/tonybeltram…

This deep learning project is very exciting as it attempts to automatically generate code for a given GUI. When building websites or mobile device interfaces, front-end engineers often have to write a lot of boring code, which is time-consuming and inefficient. This prevents developers from spending most of their time implementing real features and software logic. Pix2code aims to overcome this difficulty by automating the process. It is based on a new approach that allows the generation of computer tokens from a single GUI screenshot as input.

Pix2code is written in Python and converts captured images from mobile devices and website interfaces into code.


3. The Heart of the Machine Project

Heart of Machines currently has three projects on GitHub, namely AI00, which evaluates outstanding companies in various fields of AI, The Chinese-English Term Sets in AI, and the Model Experiment and Interpretation Project.


3.1 AI00 — Hearts of Machines List of 100 Companies influencing the future of ARTIFICIAL intelligence

Project address: github.com/jiqizhixin/…

Artificial intelligence is a complex and huge system, involving many disciplines and many elements such as technology, products, industry and capital. The writing team of this report only represents their professional views and has its own limitations, which need more industry experts to participate in correction and improvement.

We deeply understand the quality limitations of reports without professional user feedback, so we hope to use the concept of “Agile Development” in engineering to treat our reports and continuously collect professional feedback to continuously improve the quality of reports.

To this end, we will invite scientists, technologists, industry experts, professional investors and readers in the field of ARTIFICIAL intelligence to join us to complete this long-term study of ARTIFICIAL intelligence. We will summarize and organize the information provided by participants and update this report on a monthly basis.



Artificial Intelligence – 3.2 – Terminology

Project address: github.com/jiqizhixin/…

We have recorded the technical terms encountered by Machine Heart in the compilation of technical articles and papers for your reference and translation (2nd edition).

The vocabulary database currently contains 760 professional vocabularies, which are mainly basic concepts and terms of machine learning as well as the basic vocabularies of the project. Heart of the Machine will continue to refine the glossary and construction of extended readings.

In the first stage, machine Heart will continue to improve the construction of basic vocabulary, that is, to extract common terms from authoritative textbooks or other credible sources. The second phase of the machine heart will continuously update the vocabulary with unusual terms that appear when compiling papers or other materials.

Feedback and suggestions for updates will be provided throughout the project, and readers who contributed to the project will be featured on the project’s acknowledgementpage. Because we want to update the term with more accuracy and confidence, we expect readers to include the source address and extension address of the term. As a result, we can update our vocabulary more objectively, with trusted sources and extensions.


3.3 ML – Tutorial – Experiment

Project address: github.com/jiqizhixin/…

The main purpose of this project is to demonstrate our experience and interpretation in experimental machine learning models. So far, we have explained and implemented convolutional neural networks, generative adversarial networks and CapsNet. These implementations have very detailed articles explaining the structure of the model and the implementation code. A description of the three implementation projects is shown below:

  • GitHub Project: Build convolutional neural network with TensorFlow from scratch
  • GitHub Project: GAN Complete theoretical Derivation and Implementation, Perfect!
  • Read the CapsNet architecture and then implement it with TensorFlow. This is probably the most detailed tutorial ever

Original link:www.analyticsvidhya.com/blog/2017/1…