Learn about eight applications of computer vision

The original link: mp.weixin.qq.com/s/z9QbjeoLo…

Several classical algorithms commonly used in machine learning have been briefly introduced in three articles before, including of course the current very popular CNNs algorithm:

Summary and Comparison of Common Machine Learning Algorithms (PART 1)
Summary and Comparison of Commonly used Machine Learning Algorithms (Middle)
Summary and Comparison of Commonly used Machine Learning Algorithms

These algorithms have their own advantages and disadvantages and applicable fields, so it is necessary to get familiar with them, but how to apply them still needs specific analysis. The common application directions of machine learning include the following:

Computer Vision (CV)
Natural Language Processing (NLP)
Speech recognition
Recommendation system
advertising

, etc.

For more details, please refer to a previously recommended website:

paperswithcode.com/sota

The site is divided in great detail into 16 general directions, including a total of 1081 sub-directions. If you want to enter the field of machine learning, you should first choose a direction field, and then understand and get familiar with the algorithms and specific solving skills required in the direction field.

Of course, this article mainly introduces the application of computer vision, computer vision is one of the most popular and developed among the 16 directions.

Computer vision can be divided into the following general directions:

Image classification
Target detection
Image segmentation
Style migration
Image reconstruction
super-resolution
Image generation
face
other

Although what is said here are images, but in fact video also belongs to the research object of computer vision, so there are video classification, detection, generation, and tracking, but the length of the relationship, as well as the current research direction is also focused on images, I will not introduce the content of video application for the moment.

Each direction provides a brief introduction to the problem that needs to be addressed in that direction, as well as recommendations for Github projects, papers, or review articles.

1. Image Classification

Image classification, also known as image recognition, as the name implies, is to identify what the image is, or what category the objects in the image belong to.

Image classification can be divided into many seed directions according to different classification criteria.

For example, according to the category label, it can be divided into:

Dichotomous problems, such as determining whether an image contains a face;
Multi-classification problems, such as bird identification;
Multi-label classification: Each category contains labels with multiple attributes. For example, for clothing classification, clothing color, texture, sleeve length and other labels can be added to output not only a single category, but also multiple attributes.

According to the classification object, it can be divided into:

General categories, such as birds, cars, cats, dogs, etc.;
Fine-grained classification. Currently, image classification is a popular field, such as birds, flowers, cats and dogs, etc. Some of their finer categories are very similar, while the same category may be difficult to distinguish due to occlusion, Angle, illumination and other reasons.

According to the number of categories, it can also be divided into:

Fee-shot learning: i.e. small sample learning, training sets with a small number of each category, includingone-shot 和 zero-shot ；
Large-scale learning: large-scale sample learning is now the mainstream classification method, which is also due to the requirements of deep learning on data sets.

Recommended Github projects are:

Awesome Image Classification
awesome-few-shot-learning
awesome-zero-shot-learning

Paper:

ImageNet Classification With Deep Convolutional Neural Networks, 2012
Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.
Going Deeper with Convolutions, 2015.
Deep Residual Learning for Image Recognition, 2015.
Inceptionv4 && Inception-ResNetv2, 2016
RexNext, 2016
NasNet, 2017
ShuffleNetV2, 2018
SKNet, 2019

Article:

Introduction | from VGG to NASNet, network overview image classification
CNN Network Architecture Evolution: From LeNet to DenseNet
Wei Xiu-can, Megvii Nanjing Research Institute: Review of fine-grained image analysis
Small sample learning annual progress | VALSE2018

Commonly used image classification data sets:

Mnist: Handwritten digital data set, containing 60,000 training sets and 10,000 test sets.
Cifar: divided into Cifar10 and Cifar100. The former contains 60,000 images in a total of 10 categories with 6,000 images per category. The latter is 100 categories with 600 images per category. Categories include animals such as cats, dogs and birds, airplanes, cars and boats.
Imagenet: Probably the largest open source image dataset, with 15 million images and 22,000 categories.

2. Object Detection

Target detection usually involves two aspects: first, finding the target, and then identifying the target.

Target detection can be divided into single object detection and multi-object detection, that is, the number of targets in the image, as shown in the following example:

The above two examples are images from the VOC 2012 dataset, but there are actually more complex scenarios, such as the MS COCO dataset image examples:

In fact, there are many methods in the field of target detection, and their development history is as follows:

From the figure above you can see that there are several method families:

R-cnn series, from R-CNN to Fast R-CNN, Faster R-CNN, Mask R-CNN;
YOLO series, from V1 to V3 in 2018

Making projects:

awesome-object-detection
Github.com/facebookres…
Github.com/jwyang/fast…

Paper:

R – CNN, 2013
Fast R – CNN, 2015
Faster – R – CNN, 2015
Mask R – CNN, 2017
YOLO, 2015
YOLOv2, 2016
YOLOv3, 2018
SSD, 2015
FPN, 2016

Article:

Target detection: R-CNN, Fast R-CNN, Faster R-CNN, YOLO, SSD
Tutorial | single stage type description of target detection method: YOLO and SSD
From RCNN to SSD, this should be the most comprehensive inventory of target detection algorithms
From R-CNN to RFBNet, the target detection architecture evolved over a 5-year period

Common data sets:

VOC 2012
MS COCO

3. Object Segmentation

Image segmentation is based on image detection, it needs to detect the object, and then the object is segmented.

Image segmentation can be divided into three types:

Ordinary segmentation: separate the pixel regions belonging to different objects, such as the segmentation of the foreground region and the background region;
Semantic segmentation: on the basis of ordinary segmentation, classification at the pixel level, pixels belonging to the same category should be classified into one category, for example, objects of different categories are divided;
Instance segmentation: on the basis of semantic segmentation, each instance object is segmented, for example, several dogs in the picture are segmented, and they are identified as different individuals, not just which category they belong to.

An example of graphic segmentation is shown below, an example of instance segmentation is shown below, with different colors for different instances.

Making:

awesome-semantic-segmentation

Paper:

U – Net, 2015
DeepLab, 2016
FCN, 2016

Article:

Depth | convolution neural network used for image segmentation: from R to Mark R (CNN) — CNN
Review —- review of image segmentation
A review of image semantic segmentation

4. Style Transfer

Style transfer refers to applying the styles of one domain or several images to other domains or images. For example, abstract styles are applied to realistic images.

An example of style transfer is as follows. Figure A is the original one, and b-F is the result of different styles.

The general data set uses commonly used data sets plus some famous art paintings, such as Van Gogh and Picasso.

Making:

A simple, concise tensorflow implementation of style transfer (neural style)
TensorFlow (Python API) implementation of Neural Style
TensorFlow CNN for fast style transfer

Paper:

A Neural Algorithm of Artistic Style, 2015
Image Style Transfer Using Convolutional Neural Networks, 2016
Deep Photo Style Transfer, 2017

Article:

A brief history of Neural Style
The migration review Style Transfer | Style
(Perceptual Losses)
Image Style Transfer
Style Transfer

5. Image Reconstruction

Image reconstruction, also known as Image Inpainting, aims to repair missing parts of an Image, such as old, damaged black and white photographs and films. Often a common data set is taken and the areas in the image that need to be fixed are artificially created.

An example of restoration is shown below, and there are a total of four pictures that need to be restored. The example comes from the paper “Image Inpainting for Irregular Holes Using Partial Convolutions”.

Paper:

Pixel Recurrent Neural Networks, 2016.
Image Inpainting for Irregular Holes Using Partial Convolutions, 2018.
Highly Scalable Image Reconstruction using Deep Neural Networks with Bandpass Filtering, 2018.
Generative Image Inpainting with Contextual Attention， 2018
Free-form Image Inpainting with Gated Convolution, 2018
EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning, 2019

Making:

Awesome-Image-Inpainting
generative_inpainting
edge-connect

Article:

The goddess is being coded? A imagine a dash back effect beyond Adobe | is open source
2018 CVPR image inpainting

6. Super-resolution

Super resolution refers to generating a task with higher resolution and more detail than the original image. An example is shown in the figure below, from the paper “Photo-realistic Single Image super-resolution Using a Generative Adversarial Network.”

In general, super-resolution models can also be used to solve image restoration and inpainting, as they are used to solve problems related to comparison.

Commonly used data sets mainly adopt existing data sets and generate low resolution images for model training.

Making:

Image Super Resolution for Anime-style Art– Super Resolution application for Anime images, 14K stars
neural-enhance
Image super-resolution through deep learning

Paper:

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, 2017.
Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution, 2017.
Deep Image Prior, 2017.
ESRGAN: Enhanced super-resolution Generative Adversarial Networks, 2018

Article:

Image super-resolution reconstruction
How will super resolution technology evolve? These six ECCV 18 papers take you through them all at once
A recent review of deep learning image super-resolution: From Models to Applications
ESRGAN: Enhanced Super-Resolution Method Based on GAN

7. Image Synthesis

Image generation is the task of generating a modified part of the image or a completely new image based on an image. This application has developed rapidly in recent years, mainly because GANs is a very hot research direction in recent years, and image generation is a major application of GANs.

An example of image generation is as follows:

Githubs:

Tensorflow-generative -model-collections– integrates various types of GANs code
The-gan-zoo- a collection of all current gans-related papers
AdversarialNetsPapers

Paper:

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015.
Conditional Image Generation with PixelCNN Decoders, 2016.
Pix2Pix–Image-to-image Translation with Conditional Adversarial Networks, 2016
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.
BigGAN –LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS, 2018

Article:

Dry goods | management.but, GAN principle, text version (complete)
Depth | generated against Internet beginners: article, understand the basic principle of GAN (resources)
Exclusive | GAN NIPS 2016 violent speech scene: the father of the principle of comprehensive interpretation to generate against a network and the future (attached PPT)
Nvidia releases GAN again! Multi-level feature style transfer face generator

8. A face

Face applications, including face recognition, face detection, face matching, face alignment and so on, which should be the most popular computer vision is the most mature application, and has been widely used in a variety of security, identity authentication, such as face payment, face unlock.

Here are a few Github projects, papers, articles, and datasets directly recommended

Making:

Awesome-face_recognition: A collection of all papers related to human faces in the last decade
Face_recognition: face recognition library, can realize recognition, detection, matching and other functions.
facenet

Paper:

FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015
Face Recognition: From Traditional to Deep Learning Methods, 2018
MSFD: Multi-scale Receptive Field Face Detector, 2018
DSFD: Dual Shot Face Detector, 2018
Neural Architecture Search for Deep Face Recognition, 2019

Article:

Face recognition technology comprehensive summary: from traditional methods to deep learning
Resources | from face detection to semantic segmentation, training models of OpenCV library

Data set:

LFW
CelebA
MS-Celeb-1M
CASIA-WebFace
FaceScrub
MegaFace

Other 10.

There are actually many other directions, including:

Image Captioning: Generate a description for a picture.

Show and Tell: A Neural Image Caption Generator, 2014.

Text to Image: Generate images based on Text.

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks, 2017.

Image Colorization: Changes an Image from black and white to color.

Colorful Image Colorization, 2016.

Human Pose Estimation: Human behavior identification

Cascaded Pyramid Network for Multi-Person Pose Estimation， 2017

Directions include 3D, video, medical images, q&A, autopilot, tracking, and more. Check out this website:

Paperswithcode.com/area/comput…

And if that one direction, want to start learning the content, the first recommend to find aspects of review articles or papers in Chinese, of course, if the English reading ability is good, you can also view the review articles in English and by looking at the reviews to see the need to read the paper, paper recommended first three to five years at a recent paper, paper is too long, Unless you need to know more about an algorithm, you don’t need to read much.

In addition, it is necessary to combine the actual project to deepen the understanding of the algorithm, by running down the code, you can also better understand how an algorithm is implemented.

reference

Machinelearningmastery.com/application…
paperswithcode.com/sota

summary

This article briefly introduces several computer vision applications, including the problems they solve and recommends several Github projects and papers, articles, and commonly used data sets.

Welcome to follow my wechat official account – Machine Learning and Computer Vision, or scan the qr code below, we can communicate, learn and progress together!

Past wonderful recommendation

Machine learning series

Beginners of machine learning actual combat tutorial!
Model evaluation, over-fitting, under-fitting and hyperparameter tuning methods
Summary and Comparison of Commonly used Machine Learning Algorithms
Summary and Comparison of Common Machine Learning Algorithms (PART 1)
How to Build a Complete Machine Learning Project
Data Preprocessing for feature Engineering (PART 1)

Github projects & Resource tutorials recommended

[Github Project recommends] a better site for reading and finding papers
TensorFlow is now available in Chinese
Must-read AI and Deep learning blog
An easy-to-understand TensorFlow tutorial
Recommend some Python books and tutorials, both beginner and advanced!
[Github project recommendation] Machine learning & Python
[Github Project Recommendations] Here are three tools to help you get the most out of Github
Github provides information about universities and foreign open course videos
Did you pronounce all these words correctly? Incidentally recommend three programmers exclusive English tutorial!

Learn about eight applications of computer vision

1. Image Classification

2. Object Detection

3. Object Segmentation

4. Style Transfer

5. Image Reconstruction

6. Super-resolution

7. Image Synthesis

8. A face

Other 10.

summary

Past wonderful recommendation

Machine learning series

Github projects & Resource tutorials recommended

Related Posts

Intelligent annotation – Interactive annotation

Issue 92: The real life film Double eleven: Zero Point War was released for the first time

More detailed! What do you need to prepare for launching a machine learning project?