Programmers learn deep learning quick start five steps

As programmers, we can learn deep learning model development just like we learn programming. Let’s take Keras as an example.

We can sum it up with the 5-4-9 model of 5 steps + 4 basic elements + 9 basic layers.

Five steps:

  1. Construction network model
  2. Compilation model
  3. Training model
  4. Evaluation model
  5. Use models to make predictions

Four basic elements:

  1. Network structure: consists of 10 basic layer structures and other layer structures
  2. Activation functions: e.g. Relu, Softmax. Formula: Softmax is used for the final output, relu is used for the rest
  3. Loss function: categorical_crossentropy multiple logarithm loss, binary_Crossentropy logarithm loss, MEAN_squareD_ERROR mean variance loss, mean_absolute_ERROR mean absolute value loss
  4. Optimizer: such as SGD stochastic gradient Descent, RMSProp, Adagrad, Adam, Adadelta, etc

Nine basic layer models

Including three main models:

  1. Fully connected layer Dense
  2. Convolution layer: e.g. Conv1d, conv2D
  3. Circulation layer: such as LSTM, GRU

Three auxiliary layers:

  1. The Activation layer
  2. Dropout layer
  3. Pooling layer

Three heterogeneous network interconnection layers:

  1. Embedding layer: Used for layer 1, the conversion of input data to other networks
  2. Flatten layer: Used for the transition between the convolution layer and the full connection layer
  3. Permute layer: Used for the interface between RNN and CNN

Let’s use a picture to understand the relationship between them

The five-step

The five-step method is a five-step process for solving problems with deep learning:

  1. Construction network model
  2. Compilation model
  3. Training model
  4. Evaluation model
  5. Use models to make predictions

In the five steps, in fact, the key step is mainly the first step, this step is determined, the following parameters can be set according to it.

A procedural approach is used to construct the network model

Let’s start with the easiest to understand, the process of constructing a network model in a procedural way. Sequential containers are provided in Keras to implement procedural constructs. Just add layers using the Sequential add method. The 10 basic layer structures will be discussed in more detail later.

Ex. :

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(units=64, input_dim=100))
model.add(Activation("relu "))
model.add(Dense(units=10))
model.add(Activation("softmax "))Copy the code

What kind of layer structure is constructed for what kind of problem will be described in the following examples.

Compilation model

Once the model is constructed, the next step is to call the Sequential compile method to compile it.

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])Copy the code

Two basic elements need to be specified at compile time: Loss is the loss function and Optimizer is the optimization function.

If you want to use only the most basic functionality, just specify the name of the string. If you want to configure more parameters, call the corresponding class to generate the object. Example: We want to add Nesterov momentum to stochastic gradient descent to generate an SGD object:

Optimizers import SGD model.pile (Loss ='categorical_crossentropy', optimizer=SGD(LR =0.01, momentum=0.9, nesterov=True))Copy the code

Lr is learning rate, learning rate. These concepts were introduced in Tensorflow fast Food Tutorial (7) gradient Descent for those who need to review them.

Training model

Call the fit function and set the output value X, labeled value Y, epochs training round number, batCH_size batch size.

model.fit(x_train, y_train, epochs=5, batch_size=32)Copy the code

Evaluation model

Whether the model training is good or not, training data does not count, we need to use test data to evaluate:

loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)Copy the code

Use models to predict

The aim of all training is to predict:

classes = model.predict(x_test, batch_size=128)Copy the code

Four basic elements

The network structure

Mainly with the back of the layer structure to assemble. How do you design the network structure? You can refer to the paper, such as this paper, whether it is VGG-19 on the left or RESNET on the right, just follow the diagram to implement it.

The activation function

For multi-category cases, the last layer is SoftMax. Relu is used in other deep learning layers. Dichotomies can be sigmoid. Tanh can also be used for shallow neural networks.

Loss function

  • Categorical_crossentropy: The logarithmic loss of multiple categories
  • Binary_crossentropy: logarithmic loss
  • Mean_squared_error: mean square error
  • Mean_absolute_error: Average absolute value loss

For multiple categories, the main use is categorical_crossentropy.

The optimizer

  • SGD: Random gradient descent
  • Adagrad: Adaptive Gradient Adaptive Gradient descent
  • Adadelta: Further improvements to Adagrad
  • RMSProp
  • Adam

The first three are covered in Tensorflow Fast Food Tutorial (7) gradient Descent, and the last two will be covered in a later tutorial.

Functional programming in deep learning

The basic layers described above, in addition to being able to add Sequential Sequential containers, are themselves callable objects that return callable objects when called. So you can think of them as functions that are concatenated through calls.

Here’s an official example:

from keras.layers import Input, Dense
from keras.models import Model

inputs = Input(shape=(784,))

x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels)Copy the code

Why use functional programming?

The answer is that not all complex network structures are added linearly to the container. Parallel, reuse, you name it. This is where Callable’s strengths come into play. For example, the Following Google Inception model is parallel:

Our code is naturally parallel to parallel, and one input, input_img, is reused by three models:

from keras.layers import Conv2D, MaxPooling2D, Input

input_img = Input(shape=(256, 256, 3))

tower_1 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)
tower_1 = Conv2D(64, (3, 3), padding='same', activation='relu')(tower_1)

tower_2 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)
tower_2 = Conv2D(64, (5, 5), padding='same', activation='relu')(tower_2)

tower_3 = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(input_img)
tower_3 = Conv2D(64, (1, 1), padding='same', activation='relu')(tower_3)

output = keras.layers.concatenate([tower_1, tower_2, tower_3], axis=1)Copy the code

Case tutorial

CNN processes MNIST handwriting recognition

All talk and no action is a sham. Let’s look at a five-step example of MNIST processing.

Let’s first parse the core model code. Since the model is linear, we’ll use Sequential containers

model = Sequential()Copy the code

The core is two convolution layers:

model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))Copy the code

To prevent overfitting, we add a maximum pooling layer, plus a Dropout layer:

Model. The add (MaxPooling2D (pool_size = (2, 2))) model. The add (Dropout (0.25))Copy the code

Now it is time to enter the full-connection layer output, and the two intermediate data conversions require a Flatten layer:

model.add(Flatten())Copy the code

Below is the full connection layer, and the activation function is RELu. Also afraid of fitting, and then a Dropout layer!

Model. The add (Dense (128, activation = 'relu')) model. The add (Dropout (0.5))Copy the code

Finally, the fully connected network output via a Softmax activation function:

model.add(Dense(num_classes, activation='softmax'))Copy the code

The loss function is a categorical_crossentropy multiclass logarithmic loss function. The optimizer uses Adadelta, which we introduced in Tensorflow tutorial (7) – Gradient descent.

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])Copy the code

Here is the complete code to run:

from __future__ import print_function import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras import backend as K batch_size = 128 num_classes = 10 epochs = 12 # input image dimensions img_rows, img_cols = 28, 28 # the data, split between train and test sets (x_train, y_train), (x_test, y_test) = mnist.load_data() if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, (2))) model. The add Dropout (0.25)) model. The add (Flatten ()) model. The add (Dense (128, The activation = 'relu) model. The add (Dropout (0.5)) model. The add (Dense (num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1])Copy the code

I’ve used the MNIST example so many times that I’m a little embarrassed. Here we come as a surprise, dealing with translation between languages!

Machine Translation: Multi-lingual translation!

6.Was it always hard for you to translate From Chinese to English when you were a student? Now don’t worry, as long as we have a table of two languages, we can train a model to do something like a machine translation. First you have to download a dictionary: www.manythings.org/anki/

And then we’ll do it the old-fashioned way, and we’ll look at the core code. This serialization problem is always RNN, usually LSTM. Here is the process of modeling with LSTM:

encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)Copy the code

The optimizer uses RMSprop, and the loss function is categorical_crossentropy. Validation_split is a collection that is randomly divided into training sets and test sets.

# Run training model.compile(optimizer='rmsprop', loss='categorical_crossentropy') model.fit([encoder_input_data, Decoder_input_data], decoder_target_data, batCH_size = batCH_size, epochs=epochs, validation_split=0.2)Copy the code

Finally, it’s not easy to train a model, so we store it.

model.save('s2s.h5')Copy the code

Here is the complete machine translation code, with comments and blank lines, which is just over 100 lines:

from __future__ import print_function

from keras.models import Model
from keras.layers import Input, LSTM, Dense
import numpy as np

batch_size = 64  # Batch size for training.
epochs = 100  # Number of epochs to train for.
latent_dim = 256  # Latent dimensionality of the encoding space.
num_samples = 10000  # Number of samples to train on.
# Path to the data txt file on disk.
data_path = 'fra-eng/fra.txt'

# Vectorize the data.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
    lines = f.read().split('\n')
for line in lines[: min(num_samples, len(lines) - 1)]:
    input_text, target_text = line.split('\t')
    # We use "tab " as the "start sequence " character
    # for the targets, and "\n " as "end sequence " character.
    target_text = '\t' + target_text + '\n'
    input_texts.append(input_text)
    target_texts.append(target_text)
    for char in input_text:
        if char not in input_characters:
            input_characters.add(char)
    for char in target_text:
        if char not in target_characters:
            target_characters.add(char)

input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])

print('Number of samples:', len(input_texts))
print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)

input_token_index = dict(
    [(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
    [(char, i) for i, char in enumerate(target_characters)])

encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length, num_encoder_tokens),
    dtype='float32')
decoder_input_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
    dtype='float32')
decoder_target_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
    dtype='float32')

for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
    for t, char in enumerate(input_text):
        encoder_input_data[i, t, input_token_index[char]] = 1.
    for t, char in enumerate(target_text):
        # decoder_target_data is ahead of decoder_input_data by one timestep
        decoder_input_data[i, t, target_token_index[char]] = 1.
        if t > 0:
            # decoder_target_data will be ahead by one timestep
            # and will not include the start character.
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.

# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)
# Save model
model.save('s2s.h5')

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
    (i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
    (i, char) for char, i in target_token_index.items())


def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence


for seq_index in range(100):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)Copy the code

Antecedents feed

Tensorflow fast-food tutorial (1) – 30 lines of code get handwriting recognition: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (2) – scalar operation: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (3) – vector: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (4) – matrix: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (5) – norm: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (6) – matrix decomposition: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (7) – gradient descent: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (8) – deep learning, brief: yq.aliyun.com/articles/58… Tensorflow fast-food tutorial (9) – convolution: yq.aliyun.com/articles/59… Tensorflow fast-food tutorial (10) — circulation neural network: yq.aliyun.com/articles/59… Tensorflow Tensorflow Tensorflow Tensorflow Tensorflow Tensorflow : yq.aliyun.com/articles/59… Tensorflow fast-food tutorial (12) – written in machine Shakespeare’s plays: yq.aliyun.com/articles/59…