Abstract

This tutorial will show you how to use machine learning methods to classify kites by species.

The tutorial will use Tensorflow’s eager mode to:

  1. Build a model
  2. Train with sample data
  3. The model is used to predict unknown data.

No experience in machine learning is required, but some understanding of Python is required.

Tensorflow programming

Tensorflow provides many apis, but it is recommended to start with the following advanced Tensorflow concepts:

  • Enable eager mode in the development environment
  • Import data using the Datasets API
  • Use TensorFlow’s Keras API to build the model and layer.

Typically, TensorFlow programs are written as follows:

  1. Import and parse data sets.
  2. Select the type of model.
  3. Training model.
  4. Use trained models to make predictions.

The first program

The tutorial will use Jupyter Notebook to execute Python code in a browser. Google provides an off-the-shelf tool Colab Notebook Eager mode available in TensorFlow version 1.7.

Enable Eager mode

The eager mode enables code to run immediately, returning concrete results, rather than waiting for the calculation diagram to be drawn. Once eager mode is turned on in code, it cannot be turned off. For details, see the eager mode guide.

from __future__ import absolute_import, division, print_function

import os
import matplotlib.pyplot as plt

import tensorflow as tf
import tensorflow.contrib.eager as tfe

tf.enable_eager_execution()

print("TensorFlow version: {}".format(tf.VERSION))
print("EAGER execution: {}".format(tf.executing_eagerly()))
Copy the code

The output

TensorFlow version: 1.7.0
EAGER execution: True
Copy the code

Iris classification problem

Suppose you are a botanist looking for a way to automatically classify the irises you find. Machine learning offers a number of algorithms for classifying flowers. For example, a sophisticated machine learning program can classify flowers based on photographs. The iris problem is simpler, and we classify them based on the length and width measurements of their sepals and petals.

There are about 300 kinds of irises, but our program distinguishes only three:

  • Iris Setosa
  • Iris Virginica
  • Iris versicolor

Fortunately, someone has created a 120 iris data set with sepals and petal measurements. This is a classic data set for machine learning beginners.

Import and parse data sets

Use Python to download the dataset file and structure the data

Download data set

train_dataset_url = 'http://download.tensorflow.org/data/iris_training.csv'
train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url), origin=train_dataset_url)
print("Local copy of the dataset file: {}".format(train_dataset_fp))
Copy the code

The output

Local copy of the dataset file: /home/jovyan/.keras/datasets/iris_training.csv
Copy the code

Check the data

The downloaded data is stored in CSV format. You can head – N5 to see the first five data.

! head -n5 {train_dataset_fp}Copy the code

The results of

120, 4, setosa, versicolor, virginica 6.4, 2.8, 5.6, 2.2, 5.0, 2.3, 3.3, 1.0, 1, 4.9, 2.5, 4.5, 1.7, 2, 4.9, 3.1, 1.5, 0.1, 0Copy the code

You can see:

  1. The first item contains information about the dataset:
  2. There are 120 sets of data. Each bar contains four characteristics and one of three possible tags.
  3. The subsequent rows are data records, one sample per row, where:
  4. The first four columns are features. In this case, these fields hold the data measured by the flower and are floating point numbers.
  5. The last column is the label, which is what we want to predict. In this data set, it is 0,1 or 2, each number corresponding to a flower name.

Each label is associated with a string (for example, “setosa”), but using numbers makes the program process faster. Tag numbers are mapped to a name, for example

  • 0: Iris setosa
  • 1: Iris versicolor
  • 2: Iris virginica

For more on features and tags, see ML Terminology Section of the Machine Learning Crash Course

Parsed data set

Because the data set is text in CSV format, you need to parse the characteristics and label values into a format that the model can use. Every line in the file is passed to the parse_csv function, which takes the first four eigenvalues and combines them into a single tensor, and then the last field is interpreted as a label. Finally, the function returns the characteristic tensor and the label tensor

def parse_csv(line):
  example_defaults = [[0.], [0.], [0.], [0.], [0]]  # sets field types
  parsed_line = tf.decode_csv(line, example_defaults)
  # First 4 fields are features, combine into single tensor
  features = tf.reshape(parsed_line[:- 1], shape=(4)),# Last field is the label
  label = tf.reshape(parsed_line[- 1], shape=())
  return features, label
Copy the code

Create tF.data.dataset for training

TensorFlow’s Dataset API can handle many common scenarios that provide data to the model. This is a high-level API that can be used to read data and convert it to a trainable data format.

The program uses tf.data.. TextlineDataset reads a CSV file and parse_CSV parses the data. Tf.data. Dataset represents the input process as a collection of elements and a series of transformations that act on those elements. The transformed methods are chained together or called sequentially — just make sure to keep references to the returned Dataset object.

The training works well if the sample is randomly arranged. Set buffer_size to a value greater than the number of samples, and then call tf.data.dataset. Shuffle to scramble the order of the input data entries. To speed up the training, set [Batch size] to 32 to process 32 samples at a time.

train_dataset = tf.data.TextLineDataset(train_dataset_fp)
train_dataset = train_dataset.skip(1)
train_dataset = train_dataset.map(parse_csv)
train_dataset = train_dataset.shuffle(buffer_size=1000)
train_dataset = train_dataset.batch(32)

features, label = tfe.Iterator(train_dataset).next()
print('example features:', features[0])
print('example label:', label[0])
Copy the code

The output is

Example features: Tf.tensor ([7.7 3.6.1 2.3], Shape =(4,), dType =float32)
example label: tf.Tensor(2, shape=(), dtype=int32)
Copy the code

Select model type

Why do we need a model?

A model is the relationship between features and labels. For iris classification, the model defines the relationship between sepal and petal measurements and iris species. Simple models can be described by simple algebra, but complex machine learning models have many parameters that are difficult to generalize.

Can the relationship between the four features and the iris species be determined without using machine learning? That is, can you create models using traditional programming techniques, such as lots of conditional statements? Given enough time to study, it may be possible to discover the relationship between these eigenvalues and the species of iris. For more complex data sets, however, such an approach can become difficult or even impossible.

A good machine learning method can determine this model. If enough representative samples are fed to the right machine learning model, the program can find the right relationship between the eigenvalues and the B-tags.

Selection model

Many machine learning models already exist, and it takes some experience to choose the right model for training. Neural network will be used here to solve the iris classification problem. Neural networks can find out the complex relationship between eigenvalues and labels. It is a highly structured computation diagram consisting of one or more hidden layers. Each hidden layer consists of one or more neurons. Several types of neural networks exist, and this tutorial uses dense, or fully connected, neural networks: neurons in one layer receive input connections from each neuron in the previous layer. The following figure shows a dense neural network consisting of one input layer, two hidden layers, and one output layer:

After training the model in the figure above, when inputting unlabeled samples, three predictions will be generated, namely, the possibility of the flower being an Iris species. Such predictions are called extrapolations. In this case, the sum of the output predictions is 1.0. In the figure above, the prediction is

  • 0.03Iris: mountain
  • 0.95: Variegated iris
  • 0.02: Virginia iris

That is, the model predicts that this unlabeled sample will be a variegated iris.

Create the model using Keras

TensorFlow’s Tf.keras API is the preferred way to create models and layers. Keras handles the complexity of tying everything together, making it easy to build models and experiment. See the Keras documentation for details.

The TF.keras.Sequential model is a linear stack layer. Its initialization requires a list of layer instances. In the example for this tutorial, the lead dense layer has 10 nodes each, and an output layer has 3 nodes representing prediction labels. The first level of the input_SHAPE parameter is required, corresponding to the number of features in the dataset.

model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(3)
])
Copy the code

Activation functions (activation in code) determine the output of individual neurons to the next layer. This works in much the same way that neurons in the brain connect. There are many activation functions available, and the hidden layer usually uses modified linear units (relUs in code).

The ideal number of hidden layers and neurons depends on the question and data set. Like many other aspects of machine learning, the selection of parts of a neural network requires knowledge and practice. As a rule of thumb, increasing the number of hidden layers and neurons generally creates a more powerful model, which requires more data for effective training.

Training model

Training is a step by step model optimization or model learning data set in machine learning. The goal of training is to fully understand the structure of the training data set and to predict unknown data. If you know too much about the data set through training, the prediction only applies to the data you see, not to the general situation. This problem is called overfitting — like a program remembering the answer rather than understanding how to solve the problem.

The iris classification problem is an example of supervised machine learning, where the model starts training from samples containing labels. In unsupervised machine learning, no labels are included in the samples; instead, the model usually finds patterns in features.

Define loss and gradient functions

Both the training and evaluation phases need to calculate the loss of the model. This can be used to measure how far apart the predicted result is from the expected label, in other words: how badly the model performs. We want to minimize or optimize the difference.

We calculate losses using tF.Losses. Sparse_softmax_cross_entropy, which takes the model’s predictions and expected tags as parameters. As the returned loss value increases, the predicted result becomes worse.

def loss(model, x, y):
    y_ = model(x)
    return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)

def grad(model, inputs, targets):
    with tfe.GradientTape() as tape:
        loss_value = loss(model, inputs, targets)
    return tape.gradient(loss_value, model.variables)
Copy the code

The Grad function in the code above calls the Loss function and tFE. GradientTape to record the operations used to optimize the model gradients. See the eager tutorial for more examples.

Creating the optimizer

The optimizer applies the calculated gradient to the variables of the model to minimize the Loss function. You can think of the situation as a surface, and by moving around on that surface, you find the lowest point

The gradient points in the direction of the fastest ascent, so we will travel in the opposite direction and move down the hill. By iteratively calculating the losses and gradients for each step (or learning rate), we will adjust the model during training. Over time, the model finds the best combination of weights and deviations to minimize losses. The lower the loss, the better the model’s prediction.

TensorFlow has many optimization algorithms for training. In this tutorial model using tf. Train. GradientDescentOptimizer, the optimizer implements the standard gradient descent algorithm (SGD). Learning_rate is the step size of each iteration. This is a hyperparameter that is usually tuned to get better results.

Optimizer = tf. Train. GradientDescentOptimizer (learning_rate = 0.01)Copy the code

Training iteration

Now everything is ready and the model is ready for training! The training loop feeds the model with a sample of the data set to help it make better predictions. The following code sets up some training steps:

  1. Iterate over each cycle. Each cycle is a complete walk through the entire data set.
  2. In this cycle, each sample in the training data set is iterated to obtain its characteristics (x) and labels (y).
  3. Predictions were made using the features in the sample and compared with the labels. Measure the inaccuracy of the prediction and use it to calculate the losses and gradients of the model.
  4. useoptimizerTo update the model’s variables.
  5. Track some statistics for visual presentation.
  6. Do the above once for each cycle.

Num_epochs is the number of times the collection of datasets is accessed in a loop. On the other hand, training a model for a long time does not guarantee that it will get better. Num_epochs is an adjustable hyperparameter that requires experience and practice to find the correct value.

train_loss_results = []
train_accuracy_results = []

num_epoches = 201
for epoch in range(num_epoches):
    epoch_loss_avg = tfe.metrics.Mean()
    epoch_accuracy = tfe.metrics.Accuracy()
    for x, y in tfe.Iterator(train_dataset):
        grads = grad(model, x, y)
        optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
        epoch_loss_avg(loss(model, x, y))
        epoch_accuracy(tf.argmax(model(x), axis=1, output_type=tf.int32), y)
        
    train_loss_results.append(epoch_loss_avg.result())
    train_accuracy_results.append(epoch_accuracy.result())
    
    if epoch % 50 == 0:
        print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(
            epoch, epoch_loss_avg.result(), epoch_accuracy.result()))
Copy the code

The output is as follows:

Epoch 000: Loss: 1.005, Accuracy: 50.833%
Epoch 050: Loss: 0.384, Accuracy: 85.000%
Epoch 100: Loss: 0.257, Accuracy: 95.833%
Epoch 150: Loss: 0.183, Accuracy: 97.500%
Epoch 200: Loss: 0.134, Accuracy: 97.500%
Copy the code

Visualization loss

Printing out the progress of training is useful, but it would be nice to be able to see the process more visually. TensorFlow integrates a very useful visualization tool called TensorBoard, but here I will use the MathPlotlib module to create basic diagrams.

It takes some experience to read such charts, but we expect to see losses fall and accuracy rise.

fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))
fig.suptitle('Training Metrics')

axes[0].set_ylabel("Loss", fontsize=14)
axes[0].plot(train_loss_results)

axes[1].set_ylabel("Accuracy", fontsize=14)
axes[1].set_xlabel("Epoch", fontsize=14)
axes[1].plot(train_accuracy_results)
plt.show()
Copy the code

Evaluate the validity of the model

Now that the model has been trained, we can get statistics on its performance.

Evaluation means determining the accuracy of the model’s predictions. In order to determine the validity of the model in iris classification, the measurement results of some sepals and petals were first transmitted to the model, and the model was asked to predict the iris species they represented, and then the prediction results were compared with the actual labels. The table below shows a relatively accurate model, which was correct 4 out of 5 times, with an accuracy of 80%.

Sample characteristics The label Model to predict
5.9 3.0 4.3 1.5 1 1
6.9 3.1 5.4 2.1 2 2
5.1 3.3 1.7 0.5 0 0
6.0 3.4 4.5 1.6 1 2
5.5 2.5 4.0 1.3 1 1

Set up the test data set

The evaluation model and the training model are similar, but the biggest difference is that the evaluation samples come from separate test sets rather than training sets. In order to fairly evaluate the validity of the model, the samples used for the evaluation model must be different from the samples used for the training model.

Setting up a test dataset is similar to setting up a training dataset. Download the CSV file, parse the data, and then scramble the data:

test_url = 'http://download.tensorflow.org/data/iris_test.csv'
test_fp = tf.keras.utils.get_file(fname=os.path.basename(test_url), origin=test_url)
test_dataset = tf.data.TextLineDataset(test_fp)
test_dataset = test_dataset.skip(1)
test_dataset = test_dataset.map(parse_csv)
test_dataset = test_dataset.shuffle(1000)
test_dataset = test_dataset.batch(32)
Copy the code

The output is

Downloading data from http://download.tensorflow.org/data/iris_test.csv 8192/573 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =====================================================================] - 0s 0us/stepCopy the code

Evaluate the model on the test data set

Unlike training, it only takes one cycle to evaluate test data. In the code below, we walk through each example in the test set and compare the model’s predictions with the actual tags. This is used to measure the accuracy of the model throughout the test set.

test_accuracy = tfe.metrics.Accuracy()

for (x, y) in tfe.Iterator(test_dataset):
    prediction = tf.argmax(model(x), axis=1, output_type=tf.int32)
    test_accuracy(prediction, y)
    
print("Test set accuracy: {:.3%}".format(test_accuracy.result()))
Copy the code

The output is

Test setAccuracy: 96.667%Copy the code

Use trained models to make predictions

We have trained a model and “proved” that it can distinguish between different kinds of irises — though not with 100% accuracy. Now let’s use the trained model to make some predictions for the unlabeled sample.

In a real world scenario, tagless samples might come from multiple sources, such as applications, CSV files, and feeds data. Now, we will manually provide three no-label samples to predict their labels. Each category is represented by a number:

  • 0Iris: mountain
  • 1: Variegated iris
  • 2: Virginia iris
class_ids = ['Iris setosa'.'Iris versicolor'.'Iris virginica'] predict_dataset = tf. Convert_to_tensor ([[5.1, 3.3, 1.7, 0.5,], [5.9, 3.0, 4.2, 1.5,], [6.9, 3.1, 5.4, Predictions = Model (predict_dataset) 21]]) Predictions = Model (predict_dataset)for i, logits in enumerate(predictions):
    class_idx = tf.argmax(logits).numpy()
    name = class_ids[class_idx]
    print("Example {} prediction: {}".format(i, name))
Copy the code

The predicted results are as follows:

Example 0 prediction: Iris setosa
Example 1 prediction: Iris versicolor
Example 2 prediction: Iris virginica
Copy the code

All predictions are correct!

For an in-depth look at the machine learning model, check out the TensorFlow programming guide.

Get started with eager execution