This is the ninth day of my participation in the August More text Challenge. For details, see: August More Text Challenge

Deep Learning with Python

This article is one of a series of notes I wrote while studying Deep Learning with Python (2nd edition, by Francois Chollet). This post marks the turn from Jupyter Notebooks to Markdown, as you can check out the original.ipynb notebooks at GitHub or Gitee.

You can read the original copy of the book online (in English) at this website. The book’s author also gave the accompanying Jupyter notebooks.

This paper is a note of Chapter 6. Deep Learning for Text and Sequences.

6.4 Sequence processing with convnets

Sequence processing using convolutional neural networks

Convolutional neural networks can effectively utilize data, extract local features, and modularize representation. Due to such special effects, CNN is not only good at dealing with computer time problems, but also can efficiently deal with sequence problems. In some sequence problems, the effect and efficiency of CNN can even exceed THAT of RNN.

Different from the two-dimensional Conv2D used for image processing, the time series is one-dimensional, so one-dimensional convolutional neural network should be used for processing.

One dimensional convolution and pooling of sequence data

Similar to two-dimensional convolution, one-dimensional convolution extracts local fragments (subsequences) from a sequence and then performs the same transformation on each fragment. A one-dimensional convolution window is a one-dimensional window on the timeline. The properties of this operation guarantee that a pattern learned in a previous position can later be recognized in another position (with time shift invariance).

The one-dimensional pooling operation is similar to the two-dimensional pooling operation: the one-dimensional segment is extracted from the input and the maximum (maximum pooling) or average (average pooling) is output. This operation is also used to reduce the length of the data (to subsample).

Implement one-dimensional convolutional neural networks

In Keras, one-dimensional convolutional neural networks are represented by Conv1D layers. The usage is similar to Conv2D in that it takes input of the shape (samples, time, features) and returns the same shape. Notice that its window is on time, the second axis of input. In Conv2D, our Windows were generally 3×3 and 5×5, and in Conv1D, we generally took the window size of 7 or 9.

Normally, we stack the Conv1D layer and MaxPooling1D layer together, and at the end of all the convolution pooling, we use a global pooling operation or flattening operation.

Take IMDB again as an example:

from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

max_features = 10000
max_len = 500

print('Loading data... ')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
Copy the code

Get:

25000 train sequences

25000 test sequences

x_train shape: (25000, 500)

x_test shape: (25000, 500)

Train and evaluate a simple one-dimensional convolutional neural network on IMDB

from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop

model = Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len))

model.add(layers.Conv1D(32.7, activation='relu'))
model.add(layers.MaxPooling1D(5))

model.add(layers.Conv1D(32.7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())

model.add(layers.Dense(1))

model.summary()

model.compile(optimizer=RMSprop(lr=1e-4),
              loss='binary_crossentropy',
              metrics=['acc'])
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)

plot_acc_and_loss(history)
Copy the code

The model structure is obtained:

Training process curve:

The results are slightly worse than RNN, but they’re still pretty good, and they train faster than LSTM.

Combine CNN and RNN to process long sequences

One-dimensional convolutional neural networks are learned by dividing the sequence into fragments and are not sensitive to time order. Therefore, for those issues where the order of the sequence has a significant impact, CNN performs far less well than RNN. For example, the Jena data set (temperature prediction) question:

First, prepare the data:

import os
import numpy as np

data_dir = "/CDFMLR/Files/dataset/jena_climate"
fname = os.path.join(data_dir, 'jena_climate_2009_2016.csv')

f = open(fname)
data = f.read()
f.close()

lines = data.split('\n')
header = lines[0].split(', ')
lines = lines[1:]

float_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
    values = [float(x) for x in line.split(', ') [1:]]
    float_data[i, :] = values
    
mean = float_data[:200000].mean(axis=0)
float_data -= mean
std = float_data[:200000].std(axis=0)
float_data /= std

def generator(data, lookback, delay, min_index, max_index,
              shuffle=False, batch_size=128, step=6) :
    if max_index is None:
        max_index = len(data) - delay - 1
    i = min_index + lookback
    while 1:
        if shuffle:
            rows = np.random.randint(
                min_index + lookback, max_index, size=batch_size)
        else:
            if i + batch_size >= max_index:
                i = min_index + lookback
            rows = np.arange(i, min(i + batch_size, max_index))
            i += len(rows)

        samples = np.zeros((len(rows),
                           lookback // step,
                           data.shape[-1]))
        targets = np.zeros((len(rows),))
        for j, row in enumerate(rows):
            indices = range(rows[j] - lookback, rows[j], step)
            samples[j] = data[indices]
            targets[j] = data[rows[j] + delay][1]
        yield samples, targets
        
lookback = 1440
step = 6
delay = 144
batch_size = 128

train_gen = generator(float_data,
                      lookback=lookback,
                      delay=delay,
                      min_index=0,
                      max_index=200000,
                      shuffle=True,
                      step=step, 
                      batch_size=batch_size)
val_gen = generator(float_data,
                    lookback=lookback,
                    delay=delay,
                    min_index=200001,
                    max_index=300000,
                    step=step,
                    batch_size=batch_size)
test_gen = generator(float_data,
                     lookback=lookback,
                     delay=delay,
                     min_index=300001,
                     max_index=None,
                     step=step,
                     batch_size=batch_size)

val_steps = (300000 - 200001 - lookback) // batch_size

test_steps = (len(float_data) - 300001 - lookback) // batch_size
Copy the code

Training and evaluating a simple one-dimensional convolutional neural network on the Jena data set:

from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop

model = Sequential()
model.add(layers.Conv1D(32.5, activation='relu',
                        input_shape=(None, float_data.shape[-1])))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32.5, activation='relu'))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32.5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit_generator(train_gen,
                              steps_per_epoch=500,
                              epochs=20,
                              validation_data=val_gen,
                              validation_steps=val_steps)

plot_acc_and_loss(history)
Copy the code

Training process curve:

It’s not as good as the common sense method that we used, but you can see that the order information is crucial to this problem. In order to learn sequential information while keeping convolutional neural networks fast and lightweight, we can use a combination of CNN and RNN.

We can use Conv1D in front of the RNN. For very long sequences (such as thousands of time steps), processing directly with an RNN is too slow or even impossible. Adding some Conv1D in front of the RNN can transform the long input sequence into a shorter sequence composed of advanced features (under sampling), and then use THE RNN to process information that can be learned sequentially sensitive.

Let’s do the temperature prediction problem again with this method, and since this method can learn longer sequences, we can have the network look at the earlier data (increasing the lookback parameter of the data generator), or we can have the network look at the higher resolution time series (decreasing the step parameter of the generator) :

step = 3  One 30-minute step is half the time it used to be
lookback = 720
delay = 144

train_gen = generator(float_data,
                      lookback=lookback,
                      delay=delay,
                      min_index=0,
                      max_index=200000,
                      shuffle=True,
                      step=step)

val_gen = generator(float_data,
                    lookback=lookback,
                    delay=delay,
                    min_index=200001,
                    max_index=300000,
                    step=step)

test_gen = generator(float_data,
                     lookback=lookback,
                     delay=delay,
                     min_index=300001,
                     max_index=None,
                     step=step)

val_steps = (300000 - 200001 - lookback) // 128
test_steps = (len(float_data) - 300001 - lookback) // 128
Copy the code

Build the network with Conv1D + GRU:

model = Sequential()
model.add(layers.Conv1D(32.5, activation='relu',
                        input_shape=(None, float_data.shape[-1])))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32.5, activation='relu'))
model.add(layers.GRU(32, dropout=0.1, recurrent_dropout=0.5))
model.add(layers.Dense(1))

model.summary()

model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit_generator(train_gen,
                              steps_per_epoch=500,
                              epochs=20,
                              validation_data=val_gen,
                              validation_steps=val_steps)

plot_acc_and_loss(history)
Copy the code

Model structure:

Training process curve:

In terms of validation losses, this architecture is not as effective as using only regularized GRUs, but it is much faster. It looks at twice the amount of data, which might not be very useful in this case, but could be important for other data sets.


By(“CDFMLR”, “2020-08-14”)