With a dozen or so lines of Python code, you can build your own machine vision model to recognize large numbers of images quickly and accurately. Come and try it!

visual

Evolution has allowed us to process images very efficiently.

Here, I’ll show you a picture.

If I were to ask you:

Can you tell which is the cat and which is the dog in the picture?

You may immediately feel that you have suffered the greatest insult. And ask me loudly: Do you think MY IQ is wrong? !

Calm down.

Another way to ask:

Could you describe your method of distinguishing cats and dogs as strict rules that you could teach a computer so that it could distinguish thousands of pictures for us?

For most people, it’s not humiliation, it’s stress.

If you’re a determined person, you might try all sorts of criteria: the color of the pixels at one point in the image, the shape of a local edge, the length of the continuous color at a horizontal point…

You give these descriptions to the computer, and sure enough, it can identify the cat on the left and the dog on the right.

The question is, can computers really tell the difference between dogs and cats?

I’ll show you another picture.

You’ll find that almost every rule definition needs to be rewritten.

When the machine was able to correctly distinguish the animals between the two images, almost opportunely, I came up with a new one…

After a few hours, you decide to give up.

Don’t be discouraged.

Your problem is not new. Even the Lord Chancellor had the same trouble.

In 1964, Supreme Court Justice Potter Stewart, in Jacobellis v. Ohio, famously said of the classification of a particular image in a movie, “I am not prepared to give a short and precise definition of its concept… But I know it when I see it.”

The original text reads as follows:

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand The description (” hard – core pornography “), and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.

In view of the needs of spiritual civilization construction, this paragraph will not be translated.

Does the inability of humans to describe the rules of image recognition to a computer mean that a computer cannot recognize images?

Of course not.

In December 2017, Scientific American named “AI that Sees like humans” as one of the emerging technologies of 2017.

You’ve heard the magic of self-driving cars, right? Can you do that without machine recognition of the image?

Your buddy probably showed you (more than once) how to use facial recognition to unlock your new iPhone X? Can you do that without machine recognition of the image?

In medicine, computers are already better at analyzing scientific images, such as X-rays, than doctors with years of experience. Can you do that without machine recognition of the image?

You may suddenly feel a little confused — is this a miracle?

It isn’t.

What computers do is learn.

By learning a sufficient number of samples, the machine can build its own model from the data. A number of criteria may be involved. But humans don’t have to tell machines any of this. It is completely self – aware and self – contained.

You might be excited.

Well, here’s an even more exciting news for you — you can easily build your own image classification system!

Don’t believe it? Please follow my introduction and give it a try.

data

We’re not going to be able to tell a dog from a cat, it’s a little bit new.

Let’s distinguish doraemon, ok?

Yes, I’m talking about Doraemon.

Who do you distinguish it from?

At first I wanted to find t-Rex, but then I thought it was cheating because they look so different.

Since Doraemon is a robot, let’s find another robot to distinguish it.

When it comes to robots, I immediately think of them.

Yes, WALLE the robot.

I’ve got 119 doraemon pictures for you, and 80 WALL · E pictures. The image has been uploaded to the Github project.

Please click this link to download the zip package. Then unzip it locally. As our demo catalog.

When you unzip it, you’ll see an image folder with two subdirectories, Doraemon and Walle.

Open the doraemon directory and let’s see what images are available.

As you can see, Doraemon’s pictures are really multifarious. Scenes, background colors, expressions, movements, angles… And so on.

These pictures vary in size and length and width.

Let’s look at WALL · E. It’s a similar situation.

Now that we have the data, let’s prepare the environment configuration.

The environment

We use Python to integrate the runtime environment, Anaconda.

Please download the latest version of Anaconda at this website. Drop down the page to find the download location. Depending on the system you are currently using, the site will automatically recommend a suitable version for you to download. I use macOS and download files in PKG format.

The download page area shows Python version 3.6 on the left and 2.7 on the right. Please select version 2.7.

Double-click the downloaded PKG file and follow the Instructions in Chinese to install it step by step.

With Anaconda installed, we need to install TuriCreate.

Please go to your “terminal” (Linux, macOS) or “command prompt” (Windows) and go to the directory we just downloaded the decompressed sample.

Execute the following command to create an Anaconda virtual environment named Turi.

Conda create -n turi python=2.7 anacondaCopy the code

Next, we activate the TURi virtual environment.

source activate turi
Copy the code

In this environment, we install the latest version of TuriCreate.

pip install -U turicreate
Copy the code

After installation, perform the following operations:

jupyter notebook
Copy the code

This brings us to the Jupyter laptop environment. Let’s create a New Python 2 notebook.

This creates a blank notebook.

Click the laptop name in the upper left corner and change it to a meaningful laptop name “demo-python-image-classification”.

Now that we’re ready, we can start writing the program.

code

First, we read in the TuriCreate package. It is a machine learning framework acquired by Apple that provides developers with a very simple interface to data analysis and ARTIFICIAL intelligence.

import turicreate as tc
Copy the code

We specify the folder image that the image is in.

img_folder = 'image'
Copy the code

As mentioned earlier, under Image, there are two folders: Doraemon and WALL · E. Note that if you need to identify other images (such as cats and dogs) in the future, please store the different categories of images in different folders in the image. The names of these folders are the category names of the image (cat and dog).

We then let TuriCreate read all the image files and store them in the Data data box.

data = tc.image_analysis.load_images(img_folder, with_path=True)
Copy the code

There might be an error message here.

Unsupported image format. Supported formats are JPEG and PNG    file: /Users/wsy/Dropbox/var/wsywork/learn/demo-workshops/demo-python-image-classification/image/walle/.DS_Store
Copy the code

In this case, there are several. The DS_Store file, which TuriCreate does not recognize, cannot be read as a picture.

These.ds_store files are hidden files created by Apple’s macOS to store custom properties of directories, such as icon positions or background colors.

We just ignore that information.

Now, let’s see what’s inside the data box.

data
Copy the code

As you can see, data contains two columns of information. The first column is the address of the picture, and the second column is the description of the length and width of the picture.

Because we used 119 Doraemon pictures and 80 WALL · E pictures, the total amount of data is 199. The data read integrity is verified. Procedure

Next, we need to let TuriCreate know the label information for different images. Is a picture Doraemon or Wall · E?

That’s why, from the start, you need to separate your images into different folders.

At this point, we use the folder name to tag the image.

data['label'] = data['path'].apply(lambda path: 'doraemon' if 'doraemon' in path else 'walle')
Copy the code

This statement marks the image in the doraemon directory as doraemon in the data box. Otherwise, it’s all walle.

Let’s look at the data data box after the tag.

data
Copy the code

As you can see, the number of items (rows) of the data is the same, but there is an extra label that indicates the category of the image.

Let’s store the data.

data.save('doraemon-walle.sframe')
Copy the code

This save action lets us save the result of the data processing so far. For the rest of the analysis, just read the sframe file instead of dealing with the folder from scratch.

You might not see an advantage in this example. But imagine if your image had several G’s, or even several T’s, it would be time consuming to read the file and tag it from scratch every time you did the analysis.

Let’s dig a little deeper into the data box.

TuriCreate provides a handy explore() function that helps you visually explore the data box information.

data.explore()
Copy the code

At this point, TuriCreate will pop up a page showing us the contents of the data box.

The original printing data box, we can only see the size of the picture, now we can browse the content of the picture.

If you think the picture is too small, that’s fine. Hover your mouse over a thumbnail to see the larger image.

The data enclosure is explored. Let’s go back to the notebook and write some more code.

Here we have TuriCreate divide the data box into a training set and a test set.

train_data, test_data = data.random_split(0.8, seed=2)
Copy the code

Training sets are used to allow machines to do observational learning. The computer builds its own model using data from the training set. But how effective is the model (for example, how accurate is the classification)? We need test sets for validation tests.

This is just like teachers should not give students all the exam questions for homework and exercises. Only tests that students have not seen before can distinguish between students who have mastered the correct method of solving the problem and those who have memorized the answers to the assignment.

We asked TuriCreate to divide 80% of the data into the training set and put the remaining 20% aside for testing. Here, I set the random seed value as 2 to ensure the consistency of data splitting. So that we can double-check our results.

Okay, so let’s get the machine to start looking at every single piece of data in the learning training set, and try to model it on its own.

When the following code executes for the first time, it waits a while. TuriCreate requires some data to be downloaded from apple’s developer website. These numbers are about 100 megabits.

The time required varies depending on the speed of your connection to Apple’s servers. It’s a slow download here, anyway.

Fortunately, you only need to download it for the first time. Subsequent iterations skip the download step.

model = tc.image_classifier.create(train_data, target='label')
Copy the code

After downloading, you will see the training information for TuriCreate.

Resizing images...
Performing feature extraction on resized images...
Completed 168/168
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.
Copy the code

TuriCreateh will help you size your images and automatically capture their features. Then it will extract 5% of the data from the training set as the verification set, and constantly iterate to find the optimal parameter configuration to achieve the best model.

There might be some warning messages here, so just ignore them.

When you see the following information, it means that the training has been successfully completed.

It can be seen that after several rounds, both the accuracy of training and the accuracy of verification have been very high.

Below, we use the obtained picture classification model to predict the test set.

predictions = model.predict(test_data)
Copy the code

We store the predictions (sequence of tags for a series of images) into the Predictions variable.

We then asked TuriCreate to tell us how our model performed on the test set.

Before you look down, guess what the correct answer is. From 0 to 1, guess a number.

After guessing, please continue.

metrics = model.evaluate(test_data)
print(metrics['accuracy'])
Copy the code

This is the result of correct rate:

0.967741935484
Copy the code

When I first saw it, I was shocked.

We only trained with more than 100 data, and we were able to achieve such high recognition accuracy on the test set (image data not seen by the machine).

To verify that this is not a failure of part of the code in the accuracy calculation, let’s actually look at the prediction.

predictions
Copy the code

Here is the printed sequence of predictive markers:

dtype: str
Rows: 31
['doraemon'.'doraemon'.'doraemon'.'doraemon'.'walle'.'doraemon'.'walle'.'doraemon'.'walle'.'walle'.'doraemon'.'doraemon'.'doraemon'.'doraemon'.'doraemon'.'walle'.'doraemon'.'doraemon'.'walle'.'walle'.'doraemon'.'doraemon'.'walle'.'walle'.'walle'.'doraemon'.'doraemon'.'walle'.'walle'.'doraemon'.'walle']
Copy the code

Look at the actual label.

test_data['label']
Copy the code

Here is the actual tag sequence:

dtype: str
Rows: 31
['doraemon'.'doraemon'.'doraemon'.'doraemon'.'walle'.'doraemon'.'walle'.'walle'.'walle'.'walle'.'doraemon'.'doraemon'.'doraemon'.'doraemon'.'doraemon'.'walle'.'doraemon'.'doraemon'.'walle'.'walle'.'doraemon'.'doraemon'.'walle'.'walle'.'walle'.'doraemon'.'doraemon'.'walle'.'walle'.'doraemon'.'walle']
Copy the code

Let’s find out which images were wrong.

You can certainly check them one by one. But if your test set has tens of thousands of data, this is inefficient.

The way we do our analysis is to first find out what inconsistencies there are between the Predictions tag sequence and the original tag sequence (test_data[‘label’]) and then show the locations of these inconsistencies in our test data set.

test_data[test_data['label'] != predictions]
Copy the code

We found that out of 31 tests, only one marker predicted a failure. The original marker was Wall · E, and our model predicted doraemon.

We get the original file path corresponding to this data point.

wrong_pred_img_path = test_data[predictions != test_data['label']] [0] ['path']
Copy the code

We then read the image into the IMG variable.

img = tc.Image(wrong_pred_img_path)
Copy the code

Using the show() function provided by TuriCreate, let’s take a look at the contents of this image.

img.show()
Copy the code

Because one of the problems with deep learning is that the model is so complex that we can’t tell exactly how the machine misread the image. But it’s not hard to see something about this image — there’s another robot besides WALL · E.

If you’ve seen the movie, you know the relationship between the two robots. Here we hit the not table. The problem is, the robot on the top right, with its round head and round brain, looks very different from wall · E, which is angular. But don’t forget, Doraemon is also round-headed.

The principle of

After following the code in the previous section, you should know how to build your own image classification system. You’ve done a pretty good job with this model without any knowledge of the principles. Isn’t it?

If you are not interested in the principle, please skip this section and look at the summary.

If you like to ask questions about knowledge, let’s talk about principles.

You’ve only written a dozen lines of code, but the model you’ve built is complex and elegant enough. It is a Convolutional Neural Network (CNN).

It is a deep machine learning model. The simplest convolutional neural network looks something like this:

On the far left is the input layer. That’s the image we typed in. In this case, Doraemon and Wall · E.

In a computer, images are stored in layers according to different colors (RGB, Red, Green, Blue). Take this example.

Depending on the resolution, the computer saves each layer of images into a matrix of some size. For a row and column, it’s just a number.

That’s why, when you run the code, the first thing TuriCreate does is resize the image. The following steps cannot be performed if the input images are of different sizes.

With the input data, they move sequentially to the next Layer, the Convolutional Layer.

The convolution layer sounds mysterious and complex. But the principle is very simple. It is made up of several filters. Each filter is a small matrix.

To use this, move the small matrix over the input data, multiply and add the numbers in the position where the matrix overlaps. So what was a matrix becomes a number after the convolution.

The following GIF illustrates the process visually.

This is the process of constantly looking for some feature from a matrix. It could be the shape of an edge or something like that.

The next Layer is called the Pooling Layer. The translation is simply speechless. I think it would be better to translate “summary layer” or “sampling layer”. We refer to this as the “sampling layer” below.

The purpose of sampling is to avoid making the machine think “there must be a sharp edge in the upper-left square.” In fact, in a picture, the object we are trying to identify may shift. Therefore, we need to blur the location of a feature by means of summary sampling, and expand it from “a specific point” to “a certain region”.

If that doesn’t seem intuitive to you, check out the GIF below.

Max-pooling is used. Take the original 2×2 range as a partition, find the maximum from it, and record it in the new result matrix.

A useful rule is that as the number of layers moves to the right, the general resulting image (actually, technically, a matrix) gets smaller and smaller, but the number of layers gets larger and larger.

Only in this way can we extract the pattern information from the picture and try to master enough patterns.

If you still don’t like it, visit this website.

It gives you a vivid interpretation of what’s going on at all levels of convolutional neural networks.

The upper left corner is the user input location. Please write a number (0-9) by hand using the mouse. It doesn’t matter if it’s ugly.

I put in a 7.

Observing the output, the model correctly judged that the first choice was 7 and the second possibility was 3. Correct answer.

Let’s look at the details of model construction.

Let’s mouse over to the first convolution layer. Stop at any pixel. The computer will tell us which pixels of the graph in the previous layer this point is obtained through feature detection.

Similarly, hover on the first Max pooling layer, and the computer can visually show us which pixel blocks the pixel is sampled from.

This site is worth your time and fun. It helps you understand what convolutional neural networks are all about.

Review our sample diagram:

The next Layer is called Fully Connected Layer. In fact, it compresses several matrices of the output of the previous Layer into one dimension and becomes a long output result.

Then there is the output layer, which corresponds to the classification we need the machine to master.

If you just look at the last two layers, you can easily relate it to the Deep Neural Network (DNN) that you learned earlier.

Since we already have deep neural networks, why bother to use the convolutional layer and the sampling layer, resulting in such a complex model?

There are two considerations:

The first is the amount of computation. The input amount of image data is generally quite large. If we directly connect it to the output layer with several deep neural layers, the input and output amount of each layer is huge, and the total calculation amount is unimaginable.

The second is the capture of pattern features. Even with a very large amount of computation, deep neural networks may not be very good at pattern recognition. Because it’s learning too much noise. The introduction of convolution layer and sampling layer can effectively filter out noise and highlight the influence of patterns in images on training results.

You might think, well, we’ve only written a dozen lines of code, and we’re using a convolutional neural network that looks like this, four or five layers down, right?

That’s not true. You have 50 floors!

Its scientific name, ResNET-50, is the product of Microsoft, which won the ILSRVC competition in 2015. In ImageNet data set, its classification and identification effect has surpassed that of human.

I have attached the address of the corresponding paper here, if you are interested, you can refer to it.

Take a look at the one at the bottom of the image above to see what it looks like.

Deep enough, complicated enough.

If you know anything about deep neural networks, it’s even more incredible. How can so many layers, so little training data, get such good test results? And if you want to get good training results, a lot of pictures of the training process, shouldn’t it take a long time?

Yes, if you build a ResNET-50 from scratch and train on an ImageNet dataset, it will take a long time, even if you have a good hardware (GPU).

If you train on your laptop… Let’s forget it.

So is TuriCreate really a miracle? Does not require long training, and only requires a small sample, can achieve a high level of classification effect?

No, there are no miracles in data science.

What causes this seemingly magical effect? This is a question to ponder, so use search engines and q&A sites to help you find the answer.

summary

Through this article, you have learned the following:

  • How to install Apple’s machine learning framework TuriCreate in Anaconda virtual environment
  • How to read image data from a folder in TuriCreate. And use the name of the folder to tag the image.
  • How to train deep neural networks in TuriCreate to distinguish pictures.
  • How to use test data set to verify the effect of image classification. And find pictures that are misclassified.
  • Convolutional Neural Network (CNN) has its basic structure and working principle.

However, due to space constraints, we did not mention or explain in depth the following questions:

  • How to batch obtain training and test picture data.
  • How to convert image formats that TuriCreate does not recognize using preprocessing.
  • How to build a Convolutional Neural Network (CNN) from the beginning, and fully control the levels and parameters of the model.
  • How to obtain a high level of classification effect without spending a long time training and only need a small sample (prompt key words: transfer learning).

Please consider the above questions in practice. Welcome to leave a comment and send email, and I exchange your thoughts and results.

discuss

Have you ever done a picture sorting task before? How did you deal with it? What useful tools are used? What are the advantages and disadvantages of our approach? Welcome to leave a message, share your experience and thinking to everyone, we exchange and discuss together.

If you like, please give it a thumbs up. You can also follow and top my official account “Nkwangshuyi” on wechat.

If you’re interested in data science, check out my series of tutorial index posts entitled how to Get started in Data Science Effectively. There are more interesting problems and solutions.