This blog post is written by translator Waleed Abdulla using TensorFlow to identify traffic signs.





I saw the speed limit sign, but I just didn’t see you

This is the first part of using deep learning models to recognize traffic signs. The purpose of this series is to learn how to use depth models to build a system, and how you might be interested in learning with me. On the Internet, you can find a lot of resources to explain the mathematical theory of neural networks, so I will focus on the practical aspects of the sharing. Next, I’ll describe some of my experiences building this model and share the source code and related materials. If you’ve already mastered basic Python syntax and simple machine learning techniques, this series is for you. But if you want to really understand machine learning, building a real system yourself is a great way to do it.

In this section, I will discuss image classification, and I will try to simplify my model as much as possible. In the next series of articles, I will also introduce convolutional neural networks, data set augmentation, and target detection.

Run the code

The source code is in this Jupyter notebook. I use Python version 3.5 and TensorFlow version 0.12. If you want to run this code in Docker, then you can use my Docker tool. Run using the following command line:

docker run -it -p 8888:8888 -p 6006:6006 -v ~/traffic:/traffic waleedka/modern-deep-learningCopy the code

From the source, you can see that my project directory is under ~/traffic, which I mapped to the /traffic directory of the Docker container. If you use a different project directory, then you can modify it.

Finding training data

My first challenge was to find a good training data set. Traffic sign recognition is a good research topic, so we can find a lot of things on the Internet.

I started googling “Traffic sign data sets” and found several good ones. Finally, I choose the Belgian traffic sign data set, because the training data of this data set is enough, but the test data set is very small, so it is very convenient to carry out research.





You can download the dataset here. There are a number of datasets available on the download page, but all you need to do is download two files in the BelgiumTS for Classification directory (cropped images) :

  • BelgiumTSC_Training (171.3 MBytes)
  • BelgiumTSC_Testing (76.5 MBytes)

After unpacking the files, here is my directory structure. I recommend that you set the same file directory as I did, so that you don’t have to change the file directory address when running the source code.

/traffic/datasets/BelgiumTS/Training/ 
/traffic/datasets/BelgiumTS/Testing/Copy the code

Each of these folders has 62 subfolders numbered 00000 to 00061. The name of the subfolder identifies the label of the image inside.

Exploration data set

If you want a more formal name for this section, you can call it exploratory data analysis. You may not find this section very useful, but I found that the code I wrote to check the data was used many times throughout the project. I often do this in Jupyter and share these notes with the team. Knowing your data set well from the start will save you a lot of time later on.

The images in this dataset are stored in a.ppm format. In fact, this is such an old format that many tools no longer support it. This means that I can’t easily view the images in these folders. Fortunately, the Scikit Image Library recognizes images in this form. The following code loads the data and returns two lists: an image and a label.

def load_data(data_dir):
    # Get all subdirectories of data_dir. Each represents a label.
    directories = [d for d in os.listdir(data_dir) 
                   if os.path.isdir(os.path.join(data_dir, d))]
    # Loop through the label directories and collect the data in
    # two lists, labels and images.
    labels = []
    images = []
    for d in directories:
        label_dir = os.path.join(data_dir, d)
        file_names = [os.path.join(label_dir, f) 
                      for f in os.listdir(label_dir) 
                      if f.endswith(".ppm")]
        for f in file_names:
            images.append(skimage.data.imread(f))
            labels.append(int(d))
    return images, labels

images, labels = load_data(train_data_dir)Copy the code

It was a small data set, so I loaded all the data into RAM. But for large data sets, you have to read data in batches.

After loading the data, convert it to Numpy format. I wrote a presentation program to display a sample image of each label. Here’s the code, and here’s our data set:





The training set. consists of 62 classes. The numbers in parentheses are the count of images of each class.

The dataset looks like an excellent dataset. The quality of the pictures is very good and there are a variety of angles and lighting conditions. More importantly, traffic signs occupy a large part of the image, which enables me to concentrate on the classification of the image without worrying about the location of traffic signs in the image (target detection). But I will cover the application of object detection in a future article.

So what’s the first thing to do? From the picture above, I noticed that although the picture is square, each image has a different aspect ratio. However, the input size of my neural network is fixed, so I need to do some processing. But first, I will take a picture of the tag and look at several pictures under the tag, such as tag 32, as follows:





Several sample images of label 32

From the above picture, we can find that although the speed limits are different, they are all classified into the same category. And that’s good, because we can ignore numbers for the rest of the program. That’s why understanding your data set up front is so important, and can save you a lot of pain and confusion later on.

For the rest of the tags, you can explore them yourself. Tags 26 and 27 are very interesting. They all have red circles and numbers inside them, so the model must recognize them well.

Handle images of different sizes

Most image classification neural networks require fixed input sizes, and our first model will do the same. So, we need to resize all the images to the same size.

But because the images have different aspect ratios, some of them will be stretched horizontally and some will be stretched vertically. So, does this approach cause problems? I don’t think there will be a problem in this data set because the stretch ratio of the image is not very large. My own data criteria are that if a person can recognize an image when it is stretched on a small scale, then the model should be able to recognize it, too.





Resizing images to a similar size and aspect ratio

So what was the size of the original image? Let’s print some first:

for image in images[:5]:
    print("shape: {0}, min: {1}, max: {2}".format(
          image.shape, image.min(), image.max()))
Output:
shape: (141, 142, 3), min: 0, max: 255
shape: (120, 123, 3), min: 0, max: 255
shape: (105, 107, 3), min: 0, max: 255
shape: (94, 105, 3), min: 7, max: 255
shape: (128, 139, 3), min: 0, max: 255Copy the code

The size of the image is around 128 x 128, so we can use this size to store the image, so that we can store as much information as possible. However, in early development, I preferred to use smaller sizes because training models would be quick, which allowed me to iterate faster. I tried 16 by 16 and 20 by 20, but they were too small. Finally, I chose a size of 32 * 32, which is easy to identify with the naked eye (see the picture below), and we wanted to make sure the scaling was a multiple of 128 * 128.

I also have a habit of printing out minimum and maximum values in data. This is an easy way to validate the scope of the data and catch program errors ahead of time. In this data set, it tells me that the color of the image is in the standard range 0 to 255.





Images resized to 32×32

Minimum feasible model

Finally we get to the most interesting part and continue with our simple style. Let’s start with the simplest possible model: a layer of networks, each neuron representing a label.





The network has 62 neurons, and each neuron takes the RGB values of all pixels of the image as input. In fact, each neuron receives 32 * 32 * 3 = 3072 inputs. This is a fully connected layer, because each neuron is connected to the input layer. You’re probably already familiar with the following equation:

y = xW + bCopy the code

I started with this simple model because it’s easy to interpret, easy to debug, and quick to train. Once this is done, then we can do more complex things based on this project.

Build TensorFlow figure

TensorFlow encapsulates the architecture of the neural network in the execution diagram. The built diagram contains actions (called Ops for short) such as Add, Multiply, 0,….. And so on. These operations perform operations on data in tensors (multidimensional arrays).





Visualization of a part of a TensorFlow graph

I’ll build this diagram step by step with the code, but I’ll give you the full code first, if you like:

# Create a graph to hold the model. graph = tf.Graph() # Create model in the graph. with graph.as_default(): # Placeholders for inputs and labels. images_ph = tf.placeholder(tf.float32, [None, 32, 32, 3]) labels_ph = tf.placeholder(tf.int32, [None]) # Flatten input from: [None, height, width, channels] # To: [None, height * width * channels] == [None, 3072] images_flat = tf.contrib.layers.flatten(images_ph) # Fully connected layer. # Generates logits of size [None, 62] logits = tf.contrib.layers.fully_connected(images_flat, 62, tf.nn.relu) # Convert logits to label indexes (int). # Shape [None], which is a 1D vector of length == batch_size. predicted_labels = tf.argmax(logits, 1) # Define the loss function. # Cross-entropy is a good choice for classification. loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits( logits, Labels_ph)) # Create training op. train = tf.train.AdamOptimizer(learning_rate=0.001). Minimize (loss) # And, finally, an initialization op to execute before training. # TODO: Rename to tf.global_variables_initializer() on tf.0.12.init = tf.initialize_all_variables()Copy the code

First, I’ll create a Graph object. TensorFlow has a default global graph, but I don’t recommend using it. Setting global variables is usually a bad habit because it is too easy to introduce errors. I prefer to create a diagram myself explicitly.

graph = tf.Graph()Copy the code

Then, I set up placeholders for images and labels. Placeholders are how TensorFlow receives input from the main program. Notice that I created the placeholder (and all the other operations) in graph.as_default(). The advantage of this is that they become part of the diagram I created, rather than in the global diagram.

with graph.as_default():
    images_ph = tf.placeholder(tf.float32, [None, 32, 32, 3])
    labels_ph = tf.placeholder(tf.int32, [None])Copy the code

The images_ph parameter has a dimension of [None, 32, 32, 3], and these four parameters represent [batch size, height, Width, channel] (often abbreviated to NHWC). The batch size represented by None means that the batch size is flexible, meaning that we can import any batch size into the model without changing the code. Pay attention to the order in which you enter data, as different sorts can be used under some models and frameworks, such as NCHW.

Next, INSTEAD of implementing the original equation y = xW + b, I define a fully connected layer. In this line, I use a convenient function, and I use the activation function. The input to the model is a one-dimensional vector, so I need to flatten the image first.





The ReLU function

Here, I use the ReLU function as the activation function, as follows:

f(x) = max(0, x)Copy the code

This activation function can easily convert negative numbers to zero. This approach works well on categorizing tasks and can be trained much faster than Sigmoid or TANh. If you want to learn more, check out here and here.

# Flatten input from: [None, height, width, channels]
# To: [None, height * width * channels] == [None, 3072]
images_flat = tf.contrib.layers.flatten(images_ph)
# Fully connected layer. 
# Generates logits of size [None, 62]
logits = tf.contrib.layers.fully_connected(images_flat, 62,
    tf.nn.relu)Copy the code

The output of the fully connected layer is a logarithmic vector of length 62 (technically, its output dimension should be [None, 62], since we are batch processing).

Output data might look like this: [0.3, 0, 0, 1.2, 2.1, 0.01, 0.4,…, 0, 0]. The higher the value, the more likely the image is to represent the label. The output is not a probability, they can be arbitrary values, and they do not add up to 1. The actual size of the output neuron is not important because it is only a relative value, relative to 62 neurons. If needed, we can easily use softmax or other functions to convert probabilities (not needed here).





Bar chart visualization of a logits vector

In this project, we only need to know the index corresponding to the maximum value, because this index represents the classification label of the picture. The operation of solving the maximum value can be expressed as follows:

# Convert logits to label indexes.
# Shape [None], which is a 1D vector of length == batch_size.
predicted_labels = tf.argmax(logits, 1)Copy the code

The output of the argmax function will be an integer in the range [0, 61].

Loss function and gradient descent

Choosing the right loss function is itself an area of research that I will not delve into here. We use cross entropy as the loss function because it is the most common function used in classification tasks. If you’re not familiar with it, it’s well explained here and here.





Credit: Wikipedia

Cross entropy is a measure of the difference between two probability vectors. Therefore, we need to convert the output of labels and neural networks into probability vectors. There is a sparse_softmax_cross_entropy_with_logits function in TensorFlow to do this. This function takes the label and the output of the neural network as input parameters and does three things: first, it converts the dimension of the label to [None, 62] (which is a 0-1 vector); Secondly, softmax function is used to convert label data and neural network output results into probability values. Third, calculate the cross entropy between them. This function will return a vector with a dimension of [None] (the vector length is the batch size), and then we use the reduce_mean function to get a value that represents the final loss value.

loss = tf.reduce_mean(
        tf.nn.sparse_softmax_cross_entropy_with_logits(
            logits, labels_ph))Copy the code

The next step is to choose a suitable optimization algorithm. I generally use ADAM optimization algorithm, because its convergence rate is faster than the general gradient descent method. If you want to know how different optimizers compare, check out this blog post.

"Train" = tf. Train. AdamOptimizer (learning_rate = 0.001). Minimize (loss)Copy the code

The last node in the diagram initializes all operations, which simply sets the value of all variables to zero (or random).

init = tf.initialize_all_variables()Copy the code

Notice that the above code hasn’t done anything yet. It just builds the diagram and describes the input. The variables defined above, such as init, Loss, and predicted_labels, do not contain specific values. They are references to what we are going to do next.

Training cycle

This is where we iterate on the training model. Before we start training, we need to create a Session object.





Remember the Graph object we talked about earlier and how it holds all the operations (Ops) in the model. Sessions, on the other hand, also hold the values of all variables. If the graph holds the equation y = xW + b, the session holds the actual values of these variables.

session = tf.Session(graph=graph)Copy the code

Normally, the first thing you do after starting a session is initialize it, as follows:

session.run(init)Copy the code

Then, we start training the model in cycles until we get the convergence we need. During training, it is very useful to record and print out the value of the loss function, which can help us monitor the progress of training.

for i in range(201):
    _, loss_value = session.run(
        [train, loss], 
        feed_dict={images_ph: images_a, labels_ph: labels_a})
    if i % 10 == 0:
        print("Loss: ", loss_value)Copy the code

As you can see, I set the number of loops to 201 and print out the loss value when the number of loops is a multiple of 10. The final output should look something like this:

Loss: 4.2588 Loss: 2.88972 Loss: 2.42234 Loss: 2.20074 Loss: 2.06985 Loss: 1.98126 Loss: 1.91674 Loss: 1.86652 Loss: 1.82595...Copy the code

Model USES

We now have a trained model stored in memory in the Session object. If we want to use it, we can use it by calling session.run(). The predicted_labels operation returns the result of the argmax function, which is what we need. Below, I randomly selected 10 images for classification, and printed the label results and prediction results at the same time.

# Pick 10 random images
sample_indexes = random.sample(range(len(images32)), 10)
sample_images = [images32[i] for i in sample_indexes]
sample_labels = [labels[i] for i in sample_indexes]
# Run the "predicted_labels" op.
predicted = session.run(predicted_labels,
                        {images_ph: sample_images})
print(sample_labels)
print(predicted)

Output:
[15, 22, 61, 44, 32, 22, 57, 38, 56, 38]
[14  22  61  44  32  22  56  38  56  38]Copy the code

In our source code, I also wrote a visual function to show the comparison result, which looks like this:





We can see from the figure that our model works correctly, but we can’t quantify its accuracy from the figure. You may have noticed that we classified or trained images, so we don’t know yet how well the model works on an unknown data set. Next, let’s take a better look at the test set.

assessment

To properly evaluate the model, we need to test on the test set. BelgiumTS provides exactly two datasets, one for training and one for testing. Therefore, we can easily use the test set to evaluate our training model.

In the source code, I loaded the test set and converted the image size to 32 × 32, then calculated the prediction accuracy. Here’s the code for the evaluation section:

# Run predictions against the full test set.
predicted = session.run(predicted_labels, 
                        feed_dict={images_ph: test_images32})
# Calculate how many matches we got.
match_count = sum([int(y == y_) 
                   for y, y_ in zip(test_labels, predicted)])
accuracy = match_count / len(test_labels)
print("Accuracy: {:.3f}".format(accuracy))Copy the code

In each run, the accuracy rate is between 0.40 and 0.70, which is caused by whether the model falls at the local minimum or the global minimum. This is also an unavoidable problem when running such a simple model. In a future article, I’ll discuss how to improve consistency of results.

Close the session

A: congratulations! At this point, you have learned how to write a simple neural network. Given how simple this neural network is, training the model on my laptop would only take a minute, so I didn’t save the training model. In the next section, I will add a section on model saving and loading, and extend it to use multi-layer networks, convolutional neural networks, and data set augmentation. Stay tuned!

# Close the session. This will destroy the trained model.
session.close()Copy the code